[
https://issues.apache.org/jira/browse/MAHOUT-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433915#comment-13433915
]
Lance Norskog commented on MAHOUT-1055:
---------------------------------------
Do you have more than 2 billion keys? Or sparse long keys with numeric values
above 2 billion?
> Change id fields to use LongWritable instead of IntWritable
> -----------------------------------------------------------
>
> Key: MAHOUT-1055
> URL: https://issues.apache.org/jira/browse/MAHOUT-1055
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.7
> Reporter: Markus Paaso
>
> Why is IntWritable used as id field type in Mahout CVB?
> (org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
> Does Long have that significant impact on performance?
> Long is much more usable as id type and int causes compatibility issues like
> the one below.
> In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter()
> LongWritable is used correctly as id field type.
> I suggest that every IntWritable id should be changed to LongWritable.
> Sequencefile produced by command 'mahout lucene.vector' cannot be handled by
> command 'mahout cvb' due to this id type incompatibility issue.
> see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira