Yeah, I agree it is a headache for the future. It is already a bit
problematic in that we have to build the jar before tests are run.
On Mar 6, 2008, at 5:30 AM, Dawid Weiss wrote:
As a side note -- Hadoop uses the simplest trick possible to figure
out the JAR location of the originating class -- it attempts to load
a resource named after the class' bytecode...
private static String findContainingJar(Class my_class) {
ClassLoader loader = my_class.getClassLoader();
String class_file = my_class.getName().replaceAll("\\.", "/") +
".class";
try {
for(Enumeration itr = loader.getResources(class_file);
itr.hasMoreElements();) {
URL url = (URL) itr.nextElement();
if ("jar".equals(url.getProtocol())) {
String toReturn = url.getPath();
if (toReturn.startsWith("file:")) {
toReturn = toReturn.substring("file:".length());
}
toReturn = URLDecoder.decode(toReturn, "UTF-8");
return toReturn.replaceAll("!.*$", "");
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return null;
}
Note the "replaceAll" line -- it truncates inside-JAR path from jar
location. I also looked at the submitter and isolation runner, they
seem to work according to my intuition I presented earlier (thread
context class loader has pointers to the invoked JAR plus all jars
under lib/), there should be no need to specify jars explicitly. I
even tend to think this is a headache for the future...
Dawid
Dawid Weiss wrote:
I changed the main's to pass in the location of the jar, since the
ANT task puts the jar in basedir/dist. I made a comment about it
on Mahout-3. The Canopy driver should do the right thing????? I
also did the same thing w/ the k-means.
I honestly don't think the JAR file must be specified as part of
the JobConf. This is a hint, but it's a hint used only in very
special cases (which I can't think of, to be honest). To my
understanding, the situation is like this:
- When you assemble a job JAR, you should package it with all
required dependencies under {jarfile.jar}/lib folder.
- All these classes are visible through context class loader set by
Hadoop, so no special JAR tricks are required. When you submit a
Hadoop job (remotely), you point to the JAR file with all
dependencies and Hadoop can take it from there.
- When you run in-memory task tracker (for debugging or locally),
all the classes should be available through normal classpath and
context class loader (again) should resolve them successfully.
Can you enlighten me when pointing an explicit JAR file for JobConf
is required?
Dawid
--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ