On Apr 29, 2009, at 10:27 AM, Shashikant Kore wrote:

Hi Jeff,

The JDK problem occurs while running the example of Synthetic Control Data from
http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html


The other query was related to how to convert convert text files to
Mahout Vector. Let's say, I have text files of wikipedia pages and now
I want to create clusters out of them. How do I get the Mahout vector
from the lucene index? Can you point me to some theory behind it, from
where I can convert it code?

I don't think we have any demo code for this yet. I have a personal task that I'm trying to get to that will demonstrate how to cluster text starting from a plain text file, but nothing in code yet, especially not anything that takes it from Lucene. All of these would be great additions to have. I think Richard Tomsett said he had some code to do it, but hasn't donated it yet. He's also put up a patch for doing cosine distance metric, but it is not committed yet.

Cheers,
Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to