Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. The 
project uses a modified RecommenderJob because it needs SequenceFile output and 
to get the location of the preparePreferenceMatrix directory. If #1 and #2 are 
addressed I can remove the modified Mahout code from the project and rely on 
the default implementations in Mahout 0.9. #3 is a longer term issue related to 
the creation of a CrossRowSimilarityJob. 

I have dropped the modified code from the Solr-recommender project and have a 
modified build of the current Mahout 0.9 snapshot. If the following changes are 
made to Mahout I can test and release a Mahout 0.9 version of the 
Solr-recommender.

1. Option to change RecommenderJob output format

Can someone add an option to output a SequenceFile. I modified the code to do 
the following, note the SequenceFileOutputFormat.class as the last parameter 
but this should really be determined with an option I think.

      Job aggregateAndRecommend = prepareJob(
              new Path(aggregateAndRecommendInput), outputPath, 
SequenceFileInputFormat.class,
              PartialMultiplyMapper.class, VarLongWritable.class, 
PrefAndSimilarityColumnWritable.class,
              AggregateAndRecommendReducer.class, VarLongWritable.class, 
RecommendedItemsWritable.class,
              SequenceFileOutputFormat.class);

2. Visibility of preparePreferenceMatrix directory location

The Solr-recommender needs to find where the RecommenderJob is putting it’s 
output. 

Mahout 0.8 RecommenderJob code was:
    public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;

Mahout 0.9 RecommenderJob code just puts “preparePreferenceMatrix” inline in 
the code:
    Path prepPath = getTempPath("preparePreferenceMatrix");

This change to Mahout 0.9 works:
    public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;
and
    Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);

You could also make this a getter method on the RecommenderJob Class instead of 
using a public constant.

3. Downsampling

The downsampling for maximum prefs per user has been moved from 
PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob uses matrix 
math instead of RSJ so it will no longer support downsampling until there is a 
hypothetical CrossRowSimilairtyJob with downsampling in it.

Reply via email to