Replace Mapper with MultithreadedMapper to implement
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.CooccurrencesMapper
------------------------------------------------------------------------------------------------------------------------------------
Key: MAHOUT-475
URL: https://issues.apache.org/jira/browse/MAHOUT-475
Project: Mahout
Issue Type: Improvement
Components: Collaborative Filtering
Affects Versions: 0.4
Reporter: Han Hui Wen
Fix For: 0.4
Because CooccurrencesMapper has huge computing,
Maybe we can replace Mapper with MultithreadedMapper.
Original:
{code}
public static class CooccurrencesMapper
extends
Mapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
new:
{code}
public static class CooccurrencesMapper
extends
MultithreadedMapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}
And call the mapper
original:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME,
distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
pairwiseSimilarity.waitForCompletion(true);
}
{code}
new:
{code}
if (shouldRunNextPhase(parsedArgs, currentPhase)) {
Job pairwiseSimilarity = prepareJob(weightsPath,
pairwiseSimilarityPath,
SequenceFileInputFormat.class,
CooccurrencesMapper.class,
WeightedRowPair.class,
Cooccurrence.class,
SimilarityReducer.class,
SimilarityMatrixEntryKey.class,
MatrixEntryWritable.class,
SequenceFileOutputFormat.class);
Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME,
distributedSimilarityClassname);
pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
CooccurrencesMapper.setNumberOfThreads(n); //n should about be less the
core counts of the machine.
pairwiseSimilarity.waitForCompletion(true);
}
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.