On Apr 29, 2009, at 10:27 AM, Shashikant Kore wrote:
Hi Jeff,
The JDK problem occurs while running the example of Synthetic
Control Data from
http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html
The other query was related to how to convert convert text files to
Mahout Vector. Let's say, I have text files of wikipedia pages and now
I want to create clusters out of them. How do I get the Mahout vector
from the lucene index? Can you point me to some theory behind it, from
where I can convert it code?
I don't think we have any demo code for this yet. I have a personal
task that I'm trying to get to that will demonstrate how to cluster
text starting from a plain text file, but nothing in code yet,
especially not anything that takes it from Lucene. All of these would
be great additions to have. I think Richard Tomsett said he had some
code to do it, but hasn't donated it yet. He's also put up a patch
for doing cosine distance metric, but it is not committed yet.
Cheers,
Grant
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search