Hi Jeff, The JDK problem occurs while running the example of Synthetic Control Data from http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html
The other query was related to how to convert convert text files to Mahout Vector. Let's say, I have text files of wikipedia pages and now I want to create clusters out of them. How do I get the Mahout vector from the lucene index? Can you point me to some theory behind it, from where I can convert it code? Thanks, --shashi On Wed, Apr 29, 2009 at 10:50 PM, Jeff Eastman <[email protected]> wrote: > Hi Shashi, > > That does sound like a JDK version problem. Most jobs require an initial > step to get the input into the correct vector format to use the clustering > code. The > /Mahout/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy/Job.java > calls an InputDriver that does that for the syntheticcontrol examples. You > would need to do something similar to massage your data into Mahout Vector > format before you can run the clustering job of your choosing. > > Jeff > > Shashikant Kore wrote: >> >> Thanks for the response, Grant. >> >> Upgrading Hadoop didn't really help. Now, I am not able to launch even >> the Namenode, JobTracker, ... as I am getting same error. I suspect >> version conflict somewhere as there are two JDK version on the box. I >> will try it out on another box which has only JDK 6. >> >> >From the documentation of clustering, it is not clear how to get the >> vectors from text (or html) files. I suppose, you can get TF-IDF >> values by indexing this content with Lucene. How does one proceed from >> there? Any pointers on that are appreciated. >> >> --shashi >> >> On Tue, Apr 28, 2009 at 8:40 PM, Grant Ingersoll <[email protected]> >> wrote: >> >>> >>> On Apr 28, 2009, at 6:01 AM, Shashikant Kore wrote: >>> >>> >>>> >>>> Hi, >>>> >>>> Initially, I got the version number error at the beginning. I found >>>> that JDK version was 1.5. It has been upgraded it to 1.6. Now >>>> JAVA_HOME points to /usr/java/jdk1.6.0_13/ and I am using Hadoop >>>> 0.18.3. >>>> >>>> 1. What could possibly be wrong? I checked the Hadoop script. Value of >>>> JAVA_HOME is correct (ie 1.6). Is it possible that somehow it is still >>>> using 1.5? >>>> >>> >>> I'm going to guess the issue is that you need Hadoop 0.19. >>> >>>> >>>> 2. The last step the clustering tutorial says "Get the data out of >>>> HDFS and have a look." Can you please point me to the documentation of >>>> Hadoop about how to read this data? >>>> >>> >>> http://hadoop.apache.org/core/docs/current/quickstart.html towards the >>> bottom. It shows some of the commands you can use w/ HDFS. -get, -cat, >>> etc. >>> >>> >>> -Grant >>> >>> >> >> >> > > -- Co-founder, Discrete Log Technologies http://www.bandhan.com/
