Replace Mapper with  MultithreadedMapper  to implement 
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.CooccurrencesMapper
------------------------------------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-475
                 URL: https://issues.apache.org/jira/browse/MAHOUT-475
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.4
            Reporter: Han Hui Wen 
             Fix For: 0.4


Because CooccurrencesMapper has huge computing,
Maybe we can replace  Mapper with  MultithreadedMapper.

Original:
{code}
 public static class CooccurrencesMapper
      extends 
Mapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}

new:
{code}
 public static class CooccurrencesMapper
      extends 
MultithreadedMapper<VarIntWritable,WeightedOccurrenceArray,WeightedRowPair,Cooccurrence>
{code}

And call the mapper
original:
{code}
    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      Job pairwiseSimilarity = prepareJob(weightsPath,
                               pairwiseSimilarityPath,
                               SequenceFileInputFormat.class,
                               CooccurrencesMapper.class,
                               WeightedRowPair.class,
                               Cooccurrence.class,
                               SimilarityReducer.class,
                               SimilarityMatrixEntryKey.class,
                               MatrixEntryWritable.class,
                               SequenceFileOutputFormat.class);

      Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
      pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, 
distributedSimilarityClassname);
      pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
      pairwiseSimilarity.waitForCompletion(true);
    }
{code}

new:
{code}
    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      Job pairwiseSimilarity = prepareJob(weightsPath,
                               pairwiseSimilarityPath,
                               SequenceFileInputFormat.class,
                               CooccurrencesMapper.class,
                               WeightedRowPair.class,
                               Cooccurrence.class,
                               SimilarityReducer.class,
                               SimilarityMatrixEntryKey.class,
                               MatrixEntryWritable.class,
                               SequenceFileOutputFormat.class);

      Configuration pairwiseConf = pairwiseSimilarity.getConfiguration();
      pairwiseConf.set(DISTRIBUTED_SIMILARITY_CLASSNAME, 
distributedSimilarityClassname);
      pairwiseConf.setInt(NUMBER_OF_COLUMNS, numberOfColumns);
      CooccurrencesMapper.setNumberOfThreads(n); //n should about be less the 
core counts of the machine.
      pairwiseSimilarity.waitForCompletion(true);
      
    }
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to