Hi all
I'm in trouble with ObjectWritable. I'm trying to implement a simple
indexation with Lucene & Hadoop, and for that I take inspiration from
nutch code. In the Indexer.java of nutch, line 245, I read:
output.collect(key, new ObjectWritable(doc));
(where doc is a Lucene Document: Document doc = new Document(); line 199
same file)
So, I try to do the same, but I encounter an error as if ObjectWritable
couldn't handle Document type:
/opt/java/bin/java -Didea.launcher.port=7539
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8
-classpath
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/LuceneScratchPad/classes/test/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/classes/production/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/lucene/lucene-core-2.2.0.jar:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/commons-cli-1.1/commons-cli-1.1.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/hadoop/conf:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-httpclient-3.0.1.jar:/opt/idea-6180/lib/idea_rt.jar
com.intellij.rt.execution.application.AppMain
com.lingway.proto.lucene.EntryPointHadoop
INFO apache.hadoop.mapred.FileInputFormat - Total input paths to
process : 9
INFO apache.hadoop.mapred.JobClient - Running job: job_myhhdn
INFO apache.hadoop.mapred.MapTask - numReduceTasks: 1
WARN apache.hadoop.mapred.LocalJobRunner - job_myhhdn
java.io.IOException: Can't write:
indexed,tokenized<content:[EMAIL PROTECTED]>
as class org.apache.lucene.document.Field
at
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:157)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:65)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:365)
at com.lingway.proto.lucene.MapIndexer.map(MapIndexer.java:35)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:131)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at
com.lingway.proto.lucene.EntryPointHadoop.main(EntryPointHadoop.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
Process finished with exit code 1
However, in my code, if I separate the instanciation of the
ObjectWritable object, the creation doesn't cause any trouble, it's only
when I try to pass it to the OutputCollector...
Any idea ? why the code of nutch doesn't behave in the same way in my
project?
(I can't afford to take the time to make nutch run, I'm at the very end
of my internship so that I'm quite in a hurry :( )
Thanks in advance,
Sam