[ 
https://issues.apache.org/jira/browse/MAHOUT-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898950#action_12898950
 ] 

Han Hui Wen  edited comment on MAHOUT-473 at 8/16/10 10:40 AM:
---------------------------------------------------------------

The Patch as folowing

In RecommenderJob

{code}
+    Job job = new Job(new Configuration(getConf()));
+    int numReduceTasks= job.getNumReduceTasks();


       try {
         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + 
maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
"--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", 
similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", 
tempDirPath.toString() });
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
+            "-Dmapred.reduce.tasks=" + numReduceTasks,
           "--numberOfColumns",String.valueOf(numberOfUsers), 
            "--similarityClassname", similarityClassname, 
            
"--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
            "--tempDir", tempDirPath.toString() });
       } catch (Exception e) {
         throw new IllegalStateException("item-item-similarity computation 
failed", e);
       }
{code}


      was (Author: huiwenhan):
    The Patch as folowing

In RecommenderJob

+    Job job = new Job(new Configuration(getConf()));
+    int numReduceTasks= job.getNumReduceTasks();


       try {
         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + 
maybePruneItemUserMatrixPath.toString(),
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
"--numberOfColumns",
            String.valueOf(numberOfUsers), "--similarityClassname", 
similarityClassname, "--maxSimilaritiesPerRow",
            String.valueOf(maxSimilaritiesPerItemConsidered + 1), "--tempDir", 
tempDirPath.toString() });
            "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
+            "-Dmapred.reduce.tasks=" + numReduceTasks,
           "--numberOfColumns",String.valueOf(numberOfUsers), 
            "--similarityClassname", similarityClassname, 
            
"--maxSimilaritiesPerRow",String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
            "--tempDir", tempDirPath.toString() });
       } catch (Exception e) {
         throw new IllegalStateException("item-item-similarity computation 
failed", e);
       }

  
> add parameter -Dmapred.reduce.tasks when call job RowSimilarityJob in 
> RecommenderJob
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-473
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-473
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>         Attachments: screenshot-1.jpg
>
>
> In RecommenderJob
> {code:title=RecommenderJob.java|borderStyle=solid}
>     int numberOfUsers = TasteHadoopUtils.readIntFromFile(getConf(), 
> countUsersPath);
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       /* Once DistributedRowMatrix uses the hadoop 0.20 API, we should 
> refactor this call to something like
>        * new DistributedRowMatrix(...).rowSimilarity(...) */
>       try {
>         RowSimilarityJob.main(new String[] { "-Dmapred.input.dir=" + 
> maybePruneItemUserMatrixPath.toString(),
>             "-Dmapred.output.dir=" + similarityMatrixPath.toString(), 
> "--numberOfColumns",
>             String.valueOf(numberOfUsers), "--similarityClassname", 
> similarityClassname, "--maxSimilaritiesPerRow",
>             String.valueOf(maxSimilaritiesPerItemConsidered + 1), 
> "--tempDir", tempDirPath.toString() });
>       } catch (Exception e) {
>         throw new IllegalStateException("item-item-similarity computation 
> failed", e);
>       }
>     }
> {code}
> We have not passed parameter -Dmapred.reduce.tasks when job RowSimilarityJob.
> It caused all three  RowSimilarityJob sub-jobs run using 1 map and 1 reduce, 
> so the sub jobs can not be scalable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to