So,

I took the stock Lucene 'IndexFiles' class. I modified it to read
UTF-8. I ran it.

I ran the following:

java -cp $cp org.apache.mahout.utils.vectors.lucene.Driver --dir
he_lucene_index \
   --output he_mahout_vector --field contents --dictOut he_mahout_dict \
   --idField path

and am rewarded with a tiny file of vectors. Clearly I'm messing something up.

Reply via email to