Markus Paaso created MAHOUT-1055:
------------------------------------
Summary: Change id fields to use LongWritable instead of
IntWritable
Key: MAHOUT-1055
URL: https://issues.apache.org/jira/browse/MAHOUT-1055
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.7
Reporter: Markus Paaso
Why is IntWritable used as id field type in Mahout CVB?
(org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
Does Long have that significant impact on performance?
Long is much more usable as id type and int causes compatibility issues like
the one below.
In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter()
LongWritable is used correctly as id field type.
I suggest that every IntWritable id should be changed to LongWritable.
Sequencefile produced by command 'mahout lucene.vector' cannot be handled by
command 'mahout cvb' due to this id type incompatibility issue.
see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira