Hi,
I have few queries regarding mahout Requirement : My requirement i . I need to generate similar documents using Mahout whereas my input will be an XML file like Wikipedia input. Configuration doubts Which mahout build I need to download as I can see mahout-0.3.zip <http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.zip> mahout-0.3-src.zip <http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3-src.zip> mahout-0.3.tar.bz2 <http://apache.oss.eznetsols.org/mahout/0.3/mahout-0.3.tar.bz2> Which one should I download to work with. Configuration Steps I am following - I have configured hadoop,cygwin and mahout 0.3 on windows using link http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/ HADOOP_CONF_DIR = C:\cygwin\home\Divya\hadoop-0.19.2\conf HADOOP_HOME = C:\cygwin\home\Divya\hadoop-0.19.2 JAVA_HOME = D:\InstalledSoftwares\Java\Java6\jdk1.6.0_21 JAVA_OPTS = -XX:MaxPermSize=128m MAHOUT_HOME= D:\mahout-0.4 MAHOUT_OPTS = -Xmx1024m MAVEN_HOME = D:\Downloads\Mahout\maven\apache-maven-2.2.1 Path = C:\cygwin\bin Steps I am following to generate document similarity SequenceFilesFromDirectory to generate sequence files SparseVectorsFromSequenceFiles to generate vectors Now I am stuck which utility should I use to compute document similarity And then how do I convert this to human readable format. Can any one help me here.. Regards, Divya
