Hi,
I was trying to generate vectors from a lucene index using the lucene.vector
driver, it worked fine using mahout 0.4 but in mahout 0.5 i get the
following exception:
SEVERE: There are too many documents that do not have a term vector for
description
Exception in thread "main" java.lang.IllegalStateException: There are too
many documents that do not have a term vector for description
at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
at
org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
My lucene index was created using:
doc.add(new Field("documentId", documentId, Field.Store.YES,
Field.Index.NOT_ANALYZED));
doc.add(new Field("content", content, Field.Store.YES,
Field.Index.ANALYZED,TermVector.YES));
If it is a know issue, sorry for the duplicate, else let me know if i can
help in order to reproduce.
-Philippe
--
Philippe Adjiman | twitter: padjiman | linkedin:
il.linkedin.com/in/philippeadjiman | blog: http://philippeadjiman.com/blog