[ 
https://issues.apache.org/jira/browse/MAHOUT-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840617#action_12840617
 ] 

Drew Farris commented on MAHOUT-320:
------------------------------------

I certainlly can't argure about the space savings. VInts are definitely more 
efficient in that sense. However, the conversion to/from VInts gets expensive 
in the sense that during the sort, Bigram is performing multiple decodes per 
pair that is compared. The question that remains is whether the cost of 
slinging the extra data around mitigates the gain from not doing those decodes 
in IntPairWritable.

> Modify IntPairWritable in LDA implementation to be binary comparable to 
> improve performance.
> --------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-320
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-320
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.3
>            Reporter: Drew Farris
>            Assignee: Robin Anil
>            Priority: Minor
>         Attachments: MAHOUT-320.patch
>
>
> Per discussion with Robin, modifying o.a.m.clustering.lda.IntPairWritable to 
> be binary comparable will improve the performance of the comparison 
> operations during a sort because no marshaling will need to occur to compare 
> IntPairWritable instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to