I have implemented a down and dirty Global Locking:

I am currently testing it but I would like to get other people idea on
this:

I used RMI for this purpose:

A RMI server which implements two methods {
boolean lock(String urlString);
void unlock(String urlString);
}

the server holds a map<key,val> where key is an Integer(host hash) the
val is a very simplistic class:

public class LockObj {
  private int hash;
  private long start;
  private long timeout;
  private int max_locks;
  private int locks = 0;
  private Object sync_obj = new Object();

  public LockObj(int hash, long timeout, int max_locks) {
    this.hash = hash;
    this.timeout = timeout;
    start = new Date().getTime();
    this.max_locks = max_locks;
  }

  public synchronized boolean lock() {
    boolean ret = false;

    if (locks+1 < max_locks) {
      synchronized(sync_obj) {
        locks++;
      }
      ret = true;
    }
    return ret;
  }

  public synchronized void unlock() {
    if (locks > 0) {
      synchronized(sync_obj) {
        locks--;
      }
    }
  }

  public int locks() {
    return locks;
  }

  // convert the host part of a url to hash
  // if url exception. use the string input for hash
  public static int make_hash(String urlString) {
    URL url = null;
    try {
      url = new URL(urlString);
    } catch (MalformedURLException e) {
    }

    return (url==null ? urlString : url.getHost()).hashCode();
  }

  // check if this object timeout has reached.
  // later implement a listener event
  public boolean timeout_reached() {
    long current = new Date().getTime();

    return (current - start) > timeout;
  }

  // free all
  public void unlock_all() {
    synchronized(sync_obj) {
      while (locks != 0)
        locks--;
    }
  }

  public int hash() {
    return hash;
  }
}

not the prettiest thing but just finished the first barrier... it
worked!!!


I changed FetcherThread constructor to create an instance of
SyncManager.

And in also in the run method I try to get a lock on the host. If not
successful I add the url into a ListArray<key,datum> for a later
processing...

I also changed generator to put each url into a separate array so all
fetchlists are even.

Would appreciate your comments and any way to improve.

The RMI is a little cumbersome but hay... for now it works for 5 task
trackers without a problem (so it seems) :)


Gal




On Wed, 2006-02-15 at 14:55 -0800, Doug Cutting wrote:
> Andrzej Bialecki wrote:
> > (FYI: if you wonder how it was working before, the trick was to generate 
> > just 1 split for the fetch job, which then lead to just one task being 
> > created for any input fetchlist.
> 
> I don't think that's right.  The generator uses setNumReduceTasks() to 
> the desired number of fetch tasks, to control how many host-disjoint 
> fetchlists are generated.  Then the fetcher does not permit input files 
> to be split, so that fetch tasks remain host-disjoint.  So lots of 
> splits can be generated, by default one per mapred.map.tasks, permitting 
> lots of parallel fetching.
> 
> This should still work.  If it does not, I'd be interested to hear more 
> details.
> 
> Doug
> 


Reply via email to