Yeah, I agree it is a headache for the future. It is already a bit problematic in that we have to build the jar before tests are run.

On Mar 6, 2008, at 5:30 AM, Dawid Weiss wrote:


As a side note -- Hadoop uses the simplest trick possible to figure out the JAR location of the originating class -- it attempts to load a resource named after the class' bytecode...

 private static String findContainingJar(Class my_class) {
   ClassLoader loader = my_class.getClassLoader();
String class_file = my_class.getName().replaceAll("\\.", "/") + ".class";
   try {
     for(Enumeration itr = loader.getResources(class_file);
         itr.hasMoreElements();) {
       URL url = (URL) itr.nextElement();
       if ("jar".equals(url.getProtocol())) {
         String toReturn = url.getPath();
         if (toReturn.startsWith("file:")) {
           toReturn = toReturn.substring("file:".length());
         }
         toReturn = URLDecoder.decode(toReturn, "UTF-8");
         return toReturn.replaceAll("!.*$", "");
       }
     }
   } catch (IOException e) {
     throw new RuntimeException(e);
   }
   return null;
 }

Note the "replaceAll" line -- it truncates inside-JAR path from jar location. I also looked at the submitter and isolation runner, they seem to work according to my intuition I presented earlier (thread context class loader has pointers to the invoked JAR plus all jars under lib/), there should be no need to specify jars explicitly. I even tend to think this is a headache for the future...

Dawid

Dawid Weiss wrote:
I changed the main's to pass in the location of the jar, since the ANT task puts the jar in basedir/dist. I made a comment about it on Mahout-3. The Canopy driver should do the right thing????? I also did the same thing w/ the k-means.
I honestly don't think the JAR file must be specified as part of the JobConf. This is a hint, but it's a hint used only in very special cases (which I can't think of, to be honest). To my understanding, the situation is like this: - When you assemble a job JAR, you should package it with all required dependencies under {jarfile.jar}/lib folder. - All these classes are visible through context class loader set by Hadoop, so no special JAR tricks are required. When you submit a Hadoop job (remotely), you point to the JAR file with all dependencies and Hadoop can take it from there. - When you run in-memory task tracker (for debugging or locally), all the classes should be available through normal classpath and context class loader (again) should resolve them successfully. Can you enlighten me when pointing an explicit JAR file for JobConf is required?
Dawid

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





Reply via email to