Hi,

when reading a Serializable-type which is contained inside the job's
jar-file I get an Exception.

I can reproduce the Exception with a fresh unpacked
hadoop-0.20.2.tar.gz and hadoop-0.21.0.tar.gz and tested under Sun's JDK
1.6.0_24.

I've attached a short example to reproduce the bug. I've compiled it and
packed it into a jar.

When running from command line I get the following output for
hadoop-0.21.0:

$ /tmp/hadoop-0.21.0/bin/hadoop jar demonstration.jar 
DemonstrationForPossibleBugInJavaSerialization
My ClassLoaderjava.net.URLClassLoader@1cbfe9d
JavaSerialization's ClassLoader: sun.misc.Launcher$AppClassLoader@cac268
11/03/03 10:40:42 INFO security.Groups: Group mapping 
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
Exception in thread "main" java.io.IOException: 
java.lang.ClassNotFoundException: 
DemonstrationForPossibleBugInJavaSerialization$MyValueClass
        at 
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:62)
        at 
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:43)
        at 
org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1886)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1859)
        at 
DemonstrationForPossibleBugInJavaSerialization.main(DemonstrationForPossibleBugInJavaSerialization.java:52)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

And the same for hadoop-0.20.2:

$ /tmp/hadoop-0.20.2/bin/hadoop jar demonstration.jar 
DemonstrationForPossibleBugInJavaSerialization
My ClassLoaderjava.net.URLClassLoader@e45076
JavaSerialization's ClassLoader: sun.misc.Launcher$AppClassLoader@1a16869
Exception in thread "main" java.io.IOException: 
java.lang.ClassNotFoundException: 
DemonstrationForPossibleBugInJavaSerialization$MyValueClass
        at 
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
        at 
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
        at 
org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
        at 
DemonstrationForPossibleBugInJavaSerialization.main(DemonstrationForPossibleBugInJavaSerialization.java:52)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


Possible reason:

I think it's a ClassLoader-issue. 

The DemonstrationForPossibleBugInJavaSerialization is loaded from the
URLClassLoader created in org.apache.hadoop.util.RunJar. It contains a
reference to job's jar-file so there's no problem to load
DemonstrationForPossibleBugInJavaSerialization$MyValueClass.

JavaSerialization is loaded from the URLClassLoader's parent, which in
this case is sun.misc.Launcher$AppClassLoader.

Inside 
org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer
an ObjectInputStream is used for deserialization. If you look into
ObjectInputStream.resolveClass(...) it uses the
"latestUserDefinedLoader" on the stack to resolve the class. Which is in
this case the JavaSerialization's ClassLoader and not the URLClassLoader
with access to (and knowledge about) the custom MyValueClass.


Possible solution:

The ObjectInputStream.resolveClass(...) of the ObjectInputStream inside
the JavaSerializationDeserializer should be overridden and use the
ClassLoader from org.apache.hadoop.conf.Configuration.getClassLoader()

Do you think this is a bug or just wrong usage from my side?

Thanks for some answers, 
  Christian
import java.io.IOException;
import java.io.Serializable;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.io.serializer.JavaSerialization;
import org.apache.hadoop.io.serializer.WritableSerialization;

/**
 * Example to demonstrate a possible bug in JavaSerialization.
 */
@SuppressWarnings("serial")
public class DemonstrationForPossibleBugInJavaSerialization {
	public static class MyValueClass implements Serializable {
		String value = "something";

		@Override
		public String toString() {
			return value;
		}
	}

	public static void main(final String[] args) throws IOException {
		final Configuration conf = new Configuration();
		final IntWritable key = new IntWritable(123);
		final MyValueClass value = new MyValueClass();

		conf.setStrings("io.serializations",
				WritableSerialization.class.getName(),
				JavaSerialization.class.getName());

		final FileSystem fs = FileSystem.get(conf);
		final Path path = new Path("/tmp/sequencefile.tmp");

		final SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
				path, IntWritable.class, MyValueClass.class);
		writer.append(key, value);
		writer.close();

		final SequenceFile.Reader reader = new Reader(fs, path, conf);
		if (reader.next(key)) {
			System.out.println(reader.getCurrentValue(value));
		}
		reader.close();
	}
}

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to