[ 
https://issues.apache.org/jira/browse/MAHOUT-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434014#comment-13434014
 ] 

Markus Paaso commented on MAHOUT-1055:
--------------------------------------

About compatibility:
Is there some point in 'mahout lucene.vector' using LongWritable when 'mahout 
cvb' uses IntWritable?

About id type:
Ids are not reusable and in huge data consuming and producing system ids will 
in a relativistic short time run out the range of Integers.
The case is same as in IPv4 or y2k.

When thinking about development of data amount in computing systems its not bad 
idea to use Long as id field type.
                
> Change id fields to use LongWritable instead of IntWritable
> -----------------------------------------------------------
>
>                 Key: MAHOUT-1055
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1055
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Markus Paaso
>
> Why is IntWritable used as id field type in Mahout CVB? 
> (org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
> Does Long have that significant impact on performance?
> Long is much more usable as id type and int causes compatibility issues like 
> the one below.
> In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter() 
> LongWritable is used correctly as id field type.
> I suggest that every IntWritable id should be changed to LongWritable.
> Sequencefile produced by command 'mahout lucene.vector' cannot be handled by 
> command 'mahout cvb' due to this id type incompatibility issue.
> see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to