mahout seqdirectory reads only from the local filesystem, even when running 
over Hadoop
---------------------------------------------------------------------------------------

                 Key: MAHOUT-535
                 URL: https://issues.apache.org/jira/browse/MAHOUT-535
             Project: Mahout
          Issue Type: Bug
          Components: Utils
    Affects Versions: 0.3, 0.4
         Environment: local and hadoop
            Reporter: Matt Spitz
            Priority: Minor


It seems as if seqdirectory only reads from the local filesystem, though it 
writes correctly to the HDFS.

Consider 'myurls-local' and 'myurls-dfs', the former existing in the working 
directory and the latter existing on the home directory of the HDFS.

Running:
MAHOUT_HOME=. ./bin/mahout seqdirectory -i myurls-local -o myurls-seqdir -c 
UTF-8 -chunk 

acts as expected (myurls-seqdir is created on the local filesystem)


Running:
MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-dfs 
-o myurls-seqdir -c UTF-8 -chunk 

creates a 12kb myurls-seqdir directory on the DFS.  Presumably, it couldn't 
read myurls-dfs from the DFS and ended up creating a nearly-empty sequence 
directory.


Running:
MAHOUT_HOME=. HADOOP_HOME=/usr/lib/hadoop-0.20 
HADOOP_CONF_DIR=/etc/hadoop-0.20/conf ./bin/mahout seqdirectory -i myurls-local 
-o myurls-seqdir -c UTF-8 -chunk 

acts as expected, creating a substantial myurls-seqdir on the DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to