Hi Dawid, I figured somebody who really understands class loaders would be able to improve on my initial implementation. I don't have a small test case for this at the moment, but you should be able to duplicate it easily by creating a new DistanceMeasure in a test project and then calling the CanopyClusteringJob as in the code fragment below. You can reuse some of the Canopy test case code to populate your initial dataset.
BTW, the original code worked fine when running locally from Eclipse, and I only saw the failures when running on a remote cluster. Evidently, Eclipse's classpath environment is different than that of a deployed map task. Jeff -----Original Message----- From: Dawid Weiss [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 12:51 AM To: [email protected] Subject: Re: Class Loader Problem Hi guys, I just looked at the code and noticed you use Class-relative classloader: Class cl = Class.forName(job.get(DISTANCE_MEASURE_KEY)); This is effectively an attempt to load a class using the caller's class class loader (the class loader is loaded via ClassLoader.getCallerClassLoader()).Usually it makes more sense to use thread's context class loader (they may be different), so: Thread.currentThread().getContextClassLoader().loadClass(...); I teach classes today, but I'll review the code and see if I can fix it. Jeff, would you by any chance have an assembled-and-ready test case or example that causes this problem? Dawid Ted Dunning wrote: > Hmmm... > > Is there a more elegant way to go here? Is there a way for the > CanopyClusteringJob to infer which jar by looking at the class? I think > that Hadoop does something like this via the class loader. > > This current method looks ripe for very obscure bugs. > > > On 3/5/08 4:49 PM, "Grant Ingersoll" <[EMAIL PROTECTED]> wrote: > >> I changed the main's to pass in the location of the jar, since the ANT >> task puts the jar in basedir/dist. I made a comment about it on >> Mahout-3. The Canopy driver should do the right thing????? I also >> did the same thing w/ the k-means. >> >> >> On Mar 5, 2008, at 2:52 PM, Jeff Eastman wrote: >> >>> Here's my job driver, it works fine with ManhattanDistanceMeasure but >>> not SystemLoadDistanceMeasure. >>> >>> Jeff >>> >>> public static void main(String[] args) { >>> String input = args[0]; >>> String output = args[1]; >>> int t1 = new Integer(args[2]); >>> int t2 = new Integer(args[3]); >>> JobConf conf = new JobConf( >>> com.collabnet.hadoop.systemload.access.DriverA.class); >>> Path outPath = new Path(output); >>> try { >>> FileSystem dfs = FileSystem.get(conf); >>> if (dfs.exists(outPath)) >>> dfs.delete(outPath); >>> DriverA.runJob(input, output); >>> DriverP.runJob(input, output); >>> DriverC.runJob(output, output); >>> CanopyClusteringJob.runJob(output + "/combined", output, >>> SystemLoadDistanceMeasure.class.getName(), t1, t2, >>> "apache-mahout-0.1-dev.jar"); >>> DriverS.runJob(output + "/clusters", output); >>> } catch (IOException e) { >>> e.printStackTrace(); >>> } >>> } >>> >>> -----Original Message----- >>> From: Ted Dunning [mailto:[EMAIL PROTECTED] >>> Sent: Wednesday, March 05, 2008 11:44 AM >>> To: [email protected] >>> Subject: Re: Class Loader Problem >>> >>> >>> Where is your code? >>> >>> >>> On 3/5/08 11:28 AM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: >>> >>>> I'm wondering if you can see anything >>>> wrong with my packaging or, perhaps, how the Canopy class is going >>> about >>>> instantiating it. >> >
