[ 
https://issues.apache.org/jira/browse/MAHOUT-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037896#comment-13037896
 ] 

Frank Scholten commented on MAHOUT-680:
---------------------------------------

I just ran the following sequence on the cluster from the mahout folder and 
this works as well.

Hadoop setup:

Last login: Mon May 23 08:04:40 2011 from 82.161.41.42
frank@domU-12-31-39-00-1C-22:~$ cd mahout-0.5-680-bug/
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ export 
HADOOP_HOME=/usr/local/hadoop-0.20.2
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ export 
HADOOP_CONF_DIR=$HADOOP_HOME/conf
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ export 
PATH=$PATH:$HADOOP_HOME/bin

Recreating directories:

frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ hadoop fs -rmr 
/user/root/input
Moved to trash: hdfs://ec2-50-17-63-252.compute-1.amazonaws.com/user/root/input
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ hadoop fs -rmr 
/user/root/output
Moved to trash: hdfs://ec2-50-17-63-252.compute-1.amazonaws.com/user/root/output
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ hadoop fs -mkdir 
/user/root/input
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ hadoop fs -put README.txt 
/user/root/input

Running seqdirectory + seq2sparse

frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ bin/mahout seqdirectory 
--input /user/root/input
frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ bin/mahout seq2sparse 
--input output --output output-seq2sparse

View the tfidf-vectors:

$ frank@domU-12-31-39-00-1C-22:~/mahout-0.5-680-bug$ bin/mahout hadoop fs -text 
/user/root/output-seq2sparse/tfidf-vectors/part-r-00000
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.2
HADOOP_CONF_DIR=/usr/local/hadoop-0.20.2/conf
/README.txt     org.apache.mahout.math.VectorWritable@4979935d

Let me know if I missed something or if your setup is different.

> Running the Hadoop script through bin/mahout to set up classpath
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-680
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-680
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Frank Scholten
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: MAHOUT-680.patch, MAHOUT-680.patch, jobtracker.jsp.html
>
>
> Added a patch which allows you to run the $HADOOP_HOME/bin/hadoop command 
> script through the bin/mahout script.
> This way the Mahout script adds the Mahout classes to the $HADOOP_CLASSPATH 
> so you can view sequencefiles generated by Mahout jobs with
> bin/mahout hadoop fs -text <sequencefile>
> without having to specify Mahout classes manually or getting 
> ClassNotFoundExceptions

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to