SparseVectorsFromSequenceFiles only outputs a single vector file
----------------------------------------------------------------

                 Key: MAHOUT-397
                 URL: https://issues.apache.org/jira/browse/MAHOUT-397
             Project: Mahout
          Issue Type: Improvement
          Components: Utils
    Affects Versions: 0.3
            Reporter: Jeff Eastman
            Assignee: Jeff Eastman
             Fix For: 0.4


When running LDA via build-reuters.sh on a 3-node Hadoop cluster, I've noticed 
that there is only a single vector file produced by the utility preprocessing 
steps. This means LDA (and other clustering too) can only use a single mapper 
no matter how large the cluster is. Investigating, it seems that the program 
argument (-nr) for setting the number of reducers - and hence the number of 
output files - is not propagated to the final stages where the output vectors 
are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to