[ 
https://issues.apache.org/jira/browse/MAHOUT-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434146#comment-13434146
 ] 

Ted Dunning commented on MAHOUT-1055:
-------------------------------------

{quote}
How about using the first 32 bits of long value as index of array and the last 
32 bits as index of element in array? Then you have an array of arrays that can 
be meta-indexed with a long. 
{quote}

This could work, but would be a royal pain in the butt, not to mention slow.

For sparse vectors and matrices, it is plausible to use longs as indexes, but 
for dense matrices it really makes little sense.  Memory use is one 
consideration, but the time to complete any single operation is another problem.

                
> Change id fields to use LongWritable instead of IntWritable
> -----------------------------------------------------------
>
>                 Key: MAHOUT-1055
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1055
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Markus Paaso
>
> Why is IntWritable used as id field type in Mahout CVB? 
> (org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper)
> Does Long have that significant impact on performance?
> Long is much more usable as id type and int causes compatibility issues like 
> the one below.
> In method org.apache.mahout.utils.vectors.lucene.Driver.getSeqFileWriter() 
> LongWritable is used correctly as id field type.
> I suggest that every IntWritable id should be changed to LongWritable.
> Sequencefile produced by command 'mahout lucene.vector' cannot be handled by 
> command 'mahout cvb' due to this id type incompatibility issue.
> see http://mahout.markmail.org/thread/r3m6ojkpbzlxxizy

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to