Hi Dawid,

I have no problem with your proposal, but have not tested it. If the unit
tests still run then go ahead and commit it. If this means we no longer need
to provide the jar file as an argument to the job, then perhaps we can ditch
that too. If it is still needed (e.g. to provide the jar file name to the
config), then specifying the job jar resolved the classloader problem in my
deployed job.

+0 (no strong opinion on this one)

Jeff

> -----Original Message-----
> From: Dawid Weiss [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 11, 2008 1:19 AM
> To: [email protected]
> Subject: Re: Class Loader Problem
> 
> 
> Jeff, did you have a chance to try it? Can we close this issue?
> 
> D.
> 
> Dawid Weiss wrote:
> >
> > Hi Jeff,
> >
> > Like I said -- it seems that this issue is actually quite trivial to
> > solve by changing to the context class loader. See attached patch at
> > MAHOUT-13. Please check if it works (I did some testing, but I don't
> > know your exact code). Two notes:
> >
> > 1) if you work from Eclipse, then bind projects together by project
> > reference (project properties -> java build path -> projects (tab) ->
> > add...). Then you'll be able to run jobs in your project that depend on
> > Mahout without fiddling with JARs at all,
> >
> > 2) for distributed execution, package Hadoop jars properly (Mahout
> > inside your project's jar, under lib/).
> >
> > If there are no objections, I would like to commit this soon.
> >
> > Dawid
> >
> >
> > Jeff Eastman wrote:
> >> Hi Dawid,
> >>
> >> I figured somebody who really understands class loaders would be able
> to
> >> improve on my initial implementation. I don't have a small test case
> for
> >> this at the moment, but you should be able to duplicate it easily by
> >> creating a new DistanceMeasure in a test project and then calling the
> >> CanopyClusteringJob as in the code fragment below. You can reuse some
> of
> >> the Canopy test case code to populate your initial dataset.
> >>
> >> BTW, the original code worked fine when running locally from Eclipse,
> >> and I only saw the failures when running on a remote cluster.
> Evidently,
> >> Eclipse's classpath environment is different than that of a deployed
> map
> >> task.
> >>
> >> Jeff
> >>
> >> -----Original Message-----
> >> From: Dawid Weiss [mailto:[EMAIL PROTECTED] Sent:
> >> Thursday, March 06, 2008 12:51 AM
> >> To: [email protected]
> >> Subject: Re: Class Loader Problem
> >>
> >>
> >> Hi guys,
> >>
> >> I just looked at the code and noticed you use Class-relative
> >> classloader:
> >>
> >> Class cl = Class.forName(job.get(DISTANCE_MEASURE_KEY));
> >>
> >> This is effectively an attempt to load a class using the caller's class
> >> class loader (the class loader is loaded via
> >> ClassLoader.getCallerClassLoader()).Usually it makes more sense to use
> >> thread's context class loader (they may be different), so:
> >>
> >> Thread.currentThread().getContextClassLoader().loadClass(...);
> >>
> >> I teach classes today, but I'll review the code and see if I can fix
> it.
> >> Jeff, would you by any chance have an assembled-and-ready test case or
> >> example
> >> that causes this problem?
> >>
> >> Dawid
> >>
> >>
> >> Ted Dunning wrote:
> >>> Hmmm...
> >>>
> >>> Is there a more elegant way to go here?  Is there a way for the
> >>> CanopyClusteringJob to infer which jar by looking at the class?  I
> >> think
> >>> that Hadoop does something like this via the class loader.
> >>>
> >>> This current method looks ripe for very obscure bugs.
> >>>
> >>>
> >>> On 3/5/08 4:49 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
> >>>
> >>>> I changed the main's to pass in the location of the jar, since the
> >> ANT
> >>>> task puts the jar in basedir/dist.  I made a comment about it on
> >>>> Mahout-3.  The Canopy driver should do the right thing?????  I also
> >>>> did the same thing w/ the k-means.
> >>>>
> >>>>
> >>>> On Mar 5, 2008, at 2:52 PM, Jeff Eastman wrote:
> >>>>
> >>>>> Here's my job driver, it works fine with ManhattanDistanceMeasure
> >> but
> >>>>> not SystemLoadDistanceMeasure.
> >>>>>
> >>>>> Jeff
> >>>>>
> >>>>> public static void main(String[] args) {
> >>>>>    String input = args[0];
> >>>>>    String output = args[1];
> >>>>>    int t1 = new Integer(args[2]);
> >>>>>    int t2 = new Integer(args[3]);
> >>>>>    JobConf conf = new JobConf(
> >>>>>        com.collabnet.hadoop.systemload.access.DriverA.class);
> >>>>>    Path outPath = new Path(output);
> >>>>>    try {
> >>>>>      FileSystem dfs = FileSystem.get(conf);
> >>>>>      if (dfs.exists(outPath))
> >>>>>        dfs.delete(outPath);
> >>>>>      DriverA.runJob(input, output);
> >>>>>      DriverP.runJob(input, output);
> >>>>>      DriverC.runJob(output, output);
> >>>>>      CanopyClusteringJob.runJob(output + "/combined", output,
> >>>>>          SystemLoadDistanceMeasure.class.getName(), t1, t2,
> >>>>>          "apache-mahout-0.1-dev.jar");
> >>>>>      DriverS.runJob(output + "/clusters", output);
> >>>>>    } catch (IOException e) {
> >>>>>      e.printStackTrace();
> >>>>>    }
> >>>>>  }
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> >>>>> Sent: Wednesday, March 05, 2008 11:44 AM
> >>>>> To: [email protected]
> >>>>> Subject: Re: Class Loader Problem
> >>>>>
> >>>>>
> >>>>> Where is your code?
> >>>>>
> >>>>>
> >>>>> On 3/5/08 11:28 AM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>> I'm wondering if you can see anything
> >>>>>> wrong with my packaging or, perhaps, how the Canopy class is going
> >>>>> about
> >>>>>> instantiating it.


Reply via email to