Perhaps they are using a patched version of Lucene? It would appear that Java 
doesn't know how to serialize the 
org.apache.lucene.analysis.standard.StandardTokenizer class, and so they might 
have added serialization methods for it to their copy of Lucene.

Good luck,
Stu


-----Original Message-----
From: Samuel LEMOINE <[EMAIL PROTECTED]>
Sent: Thu, August 16, 2007 4:36 am
To: [email protected]
Subject: ObjectWritable(Document)

Hi all
I'm in trouble with ObjectWritable. I'm trying to implement a simple 
indexation with Lucene & Hadoop, and for that I take inspiration from 
nutch code. In the Indexer.java of nutch, line 245, I read:
output.collect(key, new ObjectWritable(doc));

(where doc is a Lucene Document: Document doc = new Document(); line 199 
same file)
So, I try to do the same, but I encounter an error as if ObjectWritable 
couldn't handle Document type:

/opt/java/bin/java -Didea.launcher.port=7539 
-Didea.launcher.bin.path=/opt/idea-6180/bin -Dfile.encoding=UTF-8 
-classpath 
/opt/jdk1.5.0_12/jre/lib/charsets.jar:/opt/jdk1.5.0_12/jre/lib/jce.jar:/opt/jdk1.5.0_12/jre/lib/jsse.jar:/opt/jdk1.5.0_12/jre/lib/plugin.jar:/opt/jdk1.5.0_12/jre/lib/deploy.jar:/opt/jdk1.5.0_12/jre/lib/javaws.jar:/opt/jdk1.5.0_12/jre/lib/rt.jar:/opt/jdk1.5.0_12/jre/lib/ext/localedata.jar:/opt/jdk1.5.0_12/jre/lib/ext/dnsns.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunpkcs11.jar:/opt/jdk1.5.0_12/jre/lib/ext/sunjce_provider.jar:/home/samuel/IdeaProjects/LuceneScratchPad/classes/test/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/classes/production/LuceneScratchPad:/home/samuel/IdeaProjects/LuceneScratchPad/lib/log4j/log4j-1.2.14.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/lucene/lucene-core-2.2.0.jar:/home/samuel/hadoop-0.13.1/hadoop-0.13.1-core.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-logging-1.1/commons-logging-1.1.jar:/home/samuel/commons-cli-1.1/commons-cli-1.1.jar:/home/samuel/IdeaProjects/LuceneScratchPad/lib/hadoop/conf:/home/samuel/IdeaProjects/LuceneScratchPad/lib/commons-httpclient-3.0.1.jar:/opt/idea-6180/lib/idea_rt.jar
 
com.intellij.rt.execution.application.AppMain 
com.lingway.proto.lucene.EntryPointHadoop
INFO  apache.hadoop.mapred.FileInputFormat - Total input paths to 
process : 9
INFO  apache.hadoop.mapred.JobClient - Running job: job_myhhdn
INFO  apache.hadoop.mapred.MapTask - numReduceTasks: 1
WARN  apache.hadoop.mapred.LocalJobRunner - job_myhhdn
java.io.IOException: Can't write: 
indexed,tokenized 
as class org.apache.lucene.document.Field
    at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:157)
    at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:65)
    at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:365)
    at com.lingway.proto.lucene.MapIndexer.map(MapIndexer.java:35)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:131)
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at 
com.lingway.proto.lucene.EntryPointHadoop.main(EntryPointHadoop.java:36)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)

Process finished with exit code 1




However, in my code, if I separate the instanciation of the 
ObjectWritable object, the creation doesn't cause any trouble, it's only 
when I try to pass it to the OutputCollector...

Any idea ? why the code of nutch doesn't behave in the same way in my 
project?
(I can't afford to take the time to make nutch run, I'm at the very end 
of my internship so that I'm quite in a hurry :( )

Thanks in advance,

Sam

Reply via email to