[ 
https://issues.apache.org/jira/browse/MAHOUT-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Bozanich updated MAHOUT-1157:
----------------------------------

    Description: 
AbstractCluster.formatVector's use of the size field of the given vector causes 
problems when the vector is sparse.

I clustered a handful of vectors which had been initialized with a cardinality 
of Integer.MAX_VALUE. Running seqdump on the resulting clusteredPoints took 
over four minutes.  This is because formatVector() was iterating over the 
entire integer space for every vector.


  was:
AbstractCluster.formatVector's use of the size field of the given vector causes 
problems when the vector is sparse.

When reading WeightedVectorWriteables from the clusteredPoints directory that 
was created by running kmeans with the -cl flag, the embedded 
RandomAccessSparseVector is being instantiated with 

I clustered a handful of vectors which had been initialized with a cardinality 
of Integer.MAX_VALUE. Running seqdump on the resulting clusteredPoints took 
over four minutes.


    
> AbstractCluster.formatVector iteration bug.
> -------------------------------------------
>
>                 Key: MAHOUT-1157
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1157
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Adam Bozanich
>         Attachments: mahout.patch
>
>
> AbstractCluster.formatVector's use of the size field of the given vector 
> causes problems when the vector is sparse.
> I clustered a handful of vectors which had been initialized with a 
> cardinality of Integer.MAX_VALUE. Running seqdump on the resulting 
> clusteredPoints took over four minutes.  This is because formatVector() was 
> iterating over the entire integer space for every vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to