[
https://issues.apache.org/jira/browse/MAHOUT-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898223#action_12898223
]
Han Hui Wen commented on MAHOUT-475:
-------------------------------------
About MultithreadedMapper,here is the source:
http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java
They used a thread pool to run multiple key-values pair concurrently and
asynchronously.
It fits for that the mapper task has complicated computing ,
RowSimilarityJob-CooccurrencesMapper-SimilarityReducer is just the one.
> Replace Mapper with MultithreadedMapper to run job pairwiseSimilarity
> ------------------------------------------------------------------------
>
> Key: MAHOUT-475
> URL: https://issues.apache.org/jira/browse/MAHOUT-475
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Han Hui Wen
> Assignee: Sean Owen
> Fix For: 0.4
>
> Attachments: after_patch_20100813.jpg, patch_985097.txt
>
>
> Because CooccurrencesMapper has huge computing,
> Maybe we can replace Mapper with MultithreadedMapper.
> And call the mapper
> original:
> {code}
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> Job pairwiseSimilarity = prepareJob(weightsPath,
> pairwiseSimilarityPath,
> SequenceFileInputFormat.class,
> CooccurrencesMapper.class,
> WeightedRowPair.class,
> Cooccurrence.class,
> SimilarityReducer.class,
> SimilarityMatrixEntryKey.class,
> MatrixEntryWritable.class,
> SequenceFileOutputFormat.class);
> Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
> pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME,
> distributedSimilarityClassname);
> pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
> pairwiseSimilarity.waitForCompletion(true);
> }
> {code}
> new:
> {code}
> if (shouldRunNextPhase(parsedArgs, currentPhase)) {
> Job pairwiseSimilarity = prepareJob(weightsPath,
> pairwiseSimilarityPath,
> SequenceFileInputFormat.class,
> CooccurrencesMapper.class,
> WeightedRowPair.class,
> Cooccurrence.class,
> SimilarityReducer.class,
> SimilarityMatrixEntryKey.class,
> MatrixEntryWritable.class,
> SequenceFileOutputFormat.class);
>
> Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
> pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME,
> distributedSimilarityClassname);
> pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
> MultithreadedMapper.setMapperClass(pairwiseSimilarity,
> CooccurrencesMapper.class);
> MultithreadedMapper.setNumberOfThreads(pairwiseSimilarity,
> numMapThreads);
> SequenceFileOutputFormat.setCompressOutput(pairwiseSimilarity, true);
> SequenceFileOutputFormat.setOutputCompressorClass(pairwiseSimilarity,
> GzipCodec.class);
> SequenceFileOutputFormat.setOutputCompressionType(pairwiseSimilarity,
> CompressionType.BLOCK);
> pairwiseSimilarity.waitForCompletion(true);
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.