Markus Paaso created MAHOUT-1055:
------------------------------------

             Summary: Change id fields to use LongWritable instead of 
IntWritable
                 Key: MAHOUT-1055
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1055
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.7
            Reporter: Markus Paaso


Why is IntWritable used as id field type in Mahout CVB? 
(org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
Does Long have that significant impact on performance?

Long is much more usable as id type and int causes compatibility issues like 
the one below.

In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter() 
LongWritable is used correctly as id field type.

I suggest that every IntWritable id should be changed to LongWritable.

Sequencefile produced by command 'mahout lucene.vector' cannot be handled by 
command 'mahout cvb' due to this id type incompatibility issue.

see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to