[jira] Commented: (MAHOUT-475) Replace Mapper with MultithreadedMapper to run job pairwiseSimilarity

Han Hui Wen (JIRA) Fri, 13 Aug 2010 06:09:49 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898223#action_12898223
 ]


Han Hui Wen  commented on MAHOUT-475:
-------------------------------------

About MultithreadedMapper,here is the source:

http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java

They used a thread pool to run multiple key-values pair concurrently and 
asynchronously.

It fits for that the mapper task has complicated computing , 
RowSimilarityJob-CooccurrencesMapper-SimilarityReducer is just the one.

> Replace Mapper with  MultithreadedMapper  to run job pairwiseSimilarity 
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-475
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-475
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>            Assignee: Sean Owen
>             Fix For: 0.4
>
>         Attachments: after_patch_20100813.jpg, patch_985097.txt
>
>
> Because CooccurrencesMapper has huge computing,
> Maybe we can replace  Mapper with  MultithreadedMapper.
> And call the mapper
> original:
> {code}
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       Job pairwiseSimilarity = prepareJob(weightsPath,
>                                pairwiseSimilarityPath,
>                                SequenceFileInputFormat.class,
>                                CooccurrencesMapper.class,
>                                WeightedRowPair.class,
>                                Cooccurrence.class,
>                                SimilarityReducer.class,
>                                SimilarityMatrixEntryKey.class,
>                                MatrixEntryWritable.class,
>                                SequenceFileOutputFormat.class);
>       Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
>       pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, 
> distributedSimilarityClassname);
>       pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
>       pairwiseSimilarity.waitForCompletion(true);
>     }
> {code}
> new:
> {code}
>     if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>       Job pairwiseSimilarity = prepareJob(weightsPath,
>                                pairwiseSimilarityPath,
>                                SequenceFileInputFormat.class,
>                                CooccurrencesMapper.class,
>                                WeightedRowPair.class,
>                                Cooccurrence.class,
>                                SimilarityReducer.class,
>                                SimilarityMatrixEntryKey.class,
>                                MatrixEntryWritable.class,
>                                SequenceFileOutputFormat.class);
>       
>       Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
>       pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, 
> distributedSimilarityClassname);
>       pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
>       MultithreadedMapper.setMapperClass(pairwiseSimilarity, 
> CooccurrencesMapper.class);
>       MultithreadedMapper.setNumberOfThreads(pairwiseSimilarity, 
> numMapThreads);
>       SequenceFileOutputFormat.setCompressOutput(pairwiseSimilarity, true);
>       SequenceFileOutputFormat.setOutputCompressorClass(pairwiseSimilarity, 
> GzipCodec.class);
>       SequenceFileOutputFormat.setOutputCompressionType(pairwiseSimilarity, 
> CompressionType.BLOCK);
>       pairwiseSimilarity.waitForCompletion(true);
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-475) Replace Mapper with MultithreadedMapper to run job pairwiseSimilarity

Reply via email to