[
https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630727#comment-13630727
]
Saikat Kanjilal commented on MAHOUT-974:
----------------------------------------
I am reading through the PreparePreferenceMatrixJob and I was wondering if by
mapping between longs to ints you're referring to the following lines of code:
//convert items to an internal index
Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX),
TextInputFormat.class,
ItemIDIndexMapper.class, VarIntWritable.class,
VarLongWritable.class, ItemIDIndexReducer.class,
VarIntWritable.class, VarLongWritable.class,
SequenceFileOutputFormat.class);
itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
boolean succeeded = itemIDIndex.waitForCompletion(true);
if (!succeeded) {
return -1;
}
//convert user preferences into a vector per user
Job toUserVectors = prepareJob(getInputPath(),
getOutputPath(USER_VECTORS),
TextInputFormat.class,
ToItemPrefsMapper.class,
VarLongWritable.class,
booleanData ? VarLongWritable.class :
EntityPrefWritable.class,
ToUserVectorsReducer.class,
VarLongWritable.class,
VectorWritable.class,
SequenceFileOutputFormat.class);
Pardon my ignorance as this is my first time looking at this code, I dont see
any other parts of this class resembling a mapping. Also Sebastian I'm
wondering whether the mapping itself needs to be present in mahout-core so that
multiple jobs can leverage it.
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob use
> integer as userId and itemId
> ---------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-974
> URL: https://issues.apache.org/jira/browse/MAHOUT-974
> Project: Mahout
> Issue Type: Wish
> Components: Collaborative Filtering
> Affects Versions: 0.6
> Reporter: Han Hui Wen
> Assignee: Sebastian Schelter
> Labels: CF,recommendation,als
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob uses
> integer as userId and itemId,but
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob and
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob .use Long as userId and
> ItemId.
> It's best that ParallelALSFactorizationJob also uses Long as userId and
> itemId ,so that same dataset can use all the recommendation arithrmetic
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira