Trying to integrate the Solr-recoemmender with the latest Mahout snapshot. The
project uses a modified RecommenderJob because it needs SequenceFile output and
to get the location of the preparePreferenceMatrix directory. If #1 and #2 are
addressed I can remove the modified Mahout code from the project and rely on
the default implementations in Mahout 0.9. #3 is a longer term issue related to
the creation of a CrossRowSimilarityJob.
I have dropped the modified code from the Solr-recommender project and have a
modified build of the current Mahout 0.9 snapshot. If the following changes are
made to Mahout I can test and release a Mahout 0.9 version of the
Solr-recommender.
1. Option to change RecommenderJob output format
Can someone add an option to output a SequenceFile. I modified the code to do
the following, note the SequenceFileOutputFormat.class as the last parameter
but this should really be determined with an option I think.
Job aggregateAndRecommend = prepareJob(
new Path(aggregateAndRecommendInput), outputPath,
SequenceFileInputFormat.class,
PartialMultiplyMapper.class, VarLongWritable.class,
PrefAndSimilarityColumnWritable.class,
AggregateAndRecommendReducer.class, VarLongWritable.class,
RecommendedItemsWritable.class,
SequenceFileOutputFormat.class);
2. Visibility of preparePreferenceMatrix directory location
The Solr-recommender needs to find where the RecommenderJob is putting it’s
output.
Mahout 0.8 RecommenderJob code was:
public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;
Mahout 0.9 RecommenderJob code just puts “preparePreferenceMatrix” inline in
the code:
Path prepPath = getTempPath("preparePreferenceMatrix");
This change to Mahout 0.9 works:
public static final String DEFAULT_PREPARE_DIR = "preparePreferenceMatrix”;
and
Path prepPath = getTempPath(DEFAULT_PREPARE_DIR);
You could also make this a getter method on the RecommenderJob Class instead of
using a public constant.
3. Downsampling
The downsampling for maximum prefs per user has been moved from
PreparePreferenceMatrixJob to RowSimilarityJob. The XRecommenderJob uses matrix
math instead of RSJ so it will no longer support downsampling until there is a
hypothetical CrossRowSimilairtyJob with downsampling in it.