Perhaps they are using a patched version of Lucene? It would appear that Java doesn't know how to serialize the org.apache.lucene.analysis.standard.StandardTokenizer class, and so they might have added serialization methods for it to their copy of Lucene.
Good luck, Stu -----Original Message----- From: Samuel LEMOINE <[EMAIL PROTECTED]> Sent: Thu, August 16, 2007 4:36 am To: [email protected] Subject: ObjectWritable(Document) Hi all I'm in trouble with ObjectWritable. I'm trying to implement a simple indexation with Lucene & Hadoop, and for that I take inspiration from nutch code. In the Indexer.java of nutch, line 245, I read: output.collect(key, new ObjectWritable(doc)); (where doc is a Lucene Document: Document doc = new Document(); line 199 same file) So, I try to do the same, but I encounter an error as if ObjectWritable couldn't handle Document type: /opt/java/bin/java -Didea.launcher.port=7539 -Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 -classpath /opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/LuceneScratchPad/classes/test/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/classes/production/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/lucene/lucene-core-2.2.0.jar:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/commons-cli-1.1/commons-cli-1.1.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/hadoop/conf:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-httpclient-3.0.1.jar:/opt/idea-6180/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.lingway.proto.lucene.EntryPointHadoop INFO apache.hadoop.mapred.FileInputFormat - Total input paths to process : 9 INFO apache.hadoop.mapred.JobClient - Running job: job_myhhdn INFO apache.hadoop.mapred.MapTask - numReduceTasks: 1 WARN apache.hadoop.mapred.LocalJobRunner - job_myhhdn java.io.IOException: Can't write: indexed,tokenized as class org.apache.lucene.document.Field at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:157) at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:65) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:365) at com.lingway.proto.lucene.MapIndexer.map(MapIndexer.java:35) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:131) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) at com.lingway.proto.lucene.EntryPointHadoop.main(EntryPointHadoop.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90) Process finished with exit code 1 However, in my code, if I separate the instanciation of the ObjectWritable object, the creation doesn't cause any trouble, it's only when I try to pass it to the OutputCollector... Any idea ? why the code of nutch doesn't behave in the same way in my project? (I can't afford to take the time to make nutch run, I'm at the very end of my internship so that I'm quite in a hurry :( ) Thanks in advance, Sam
