[ 
https://issues.apache.org/jira/browse/MAHOUT-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630727#comment-13630727
 ] 

Saikat Kanjilal commented on MAHOUT-974:
----------------------------------------

I am reading through the PreparePreferenceMatrixJob and I was wondering if by 
mapping between longs to ints you're referring to the following lines of code:
    //convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
TextInputFormat.class,
            ItemIDIndexMapper.class, VarIntWritable.class, 
VarLongWritable.class, ItemIDIndexReducer.class,
            VarIntWritable.class, VarLongWritable.class, 
SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
    boolean succeeded = itemIDIndex.waitForCompletion(true);
    if (!succeeded) {
      return -1;
    }
    //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   ToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : 
EntityPrefWritable.class,
                                   ToUserVectorsReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

Pardon my ignorance as this is my first time looking at this code, I dont see 
any other parts of this class resembling a mapping.  Also Sebastian I'm 
wondering whether the mapping itself needs to be present in mahout-core so that 
multiple jobs can leverage it.

                
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob  use 
> integer as userId and itemId
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-974
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-974
>             Project: Mahout
>          Issue Type: Wish
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Han Hui Wen 
>            Assignee: Sebastian Schelter
>              Labels: CF,recommendation,als
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob  uses 
> integer as userId and itemId,but 
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob  and  
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob .use Long as userId and 
> ItemId.
> It's best that ParallelALSFactorizationJob   also uses Long as userId and 
> itemId ,so that same dataset can use all the recommendation arithrmetic

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to