Re: A hadoop novice meets mahout

Grant Ingersoll Fri, 29 May 2009 09:00:42 -0700

I think Shashikant was using a modified form of Mahout that encodedthe labels in the output.

I think we're still a little bit away from having a utility that trulymakes this straightforward to go from text to clusterable vectors.

No doubt what is happening is the recognition of a need for some typeof pipeline process that can work with multiple data sources andoutput various consumable formats and help select features.Unfortunately, we aren't there just yet.


-Grant

On May 29, 2009, at 11:27 AM, Benson Margulies wrote:

I'll fish for a one more hint. I'm using the MAHOUT-126 code to turntextinto data via TF-IDF. What comes out of there is not in the sameformat asyour example data. This means that I need a different InputDriver?Is one
lying about for the format written by that DocumentVector class?

On Fri, May 29, 2009 at 10:29 AM, Jeff Eastman
<[email protected]>wrote:
Benson Margulies wrote:
OK, I've got some inputs, I want to run k-means, how do I feed thebeast?
Make sure you can run the Synthetic Control example to geteverything wired
together correctly: JDK, Hadoop, Mahout. See
http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html. Thenwrite an
input job to convert your data similar to
/Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/InputDriver.java
and make a new job like
/Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java.
You will have a small adventure and then be operational.

Have fun,
Jeff

Re: A hadoop novice meets mahout

Reply via email to