[
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631787#comment-13631787
]
Dan Filimon commented on MAHOUT-1190:
-------------------------------------
I'm working on a patch now. I'd also like to improve aggregate() and I'd love
to see nearly all functions implemented in terms of aggregate() and assign().
I'm also failing some tests right now... but one thing I noticed which I was
getting wrong is that you should only ever use the iterator when dealing with
isSequentialAccess()==true vectors.
Otherwise, the order of the operations matters.
This doesn't seem like it should be a problem for clustering since from what
I've seen it's only doing additions when adding a new point to a cluster.
Anyway, I'm also looking at the failing tests and since I'm changing more
things, you can leave this to me.
I'll ping when I have a patch.
Thanks a lot!
> SequentialAccessSparseVector function assignment is very slow
> -------------------------------------------------------------
>
> Key: MAHOUT-1190
> URL: https://issues.apache.org/jira/browse/MAHOUT-1190
> Project: Mahout
> Issue Type: Bug
> Reporter: Dan Filimon
> Attachments: MAHOUT-1190-1.patch, MAHOUT-1190-iterator-fix.patch,
> MAHOUT-1190-iterator-fix.patch, MAHOUT-1190-iterator-fix.patch,
> MAHOUT-1190.patch, MAHOUT-1190-seq-dot-product.patch,
> MAHOUT-1190-seq-dot-product.patch
>
>
> Currently when calling .assign() on a SASV with another vector and a custom
> function, it will iterate through it and assign every single entry while also
> referring it by index.
> This makes the process *hugely* expensive. (on a run of BallKMeans on the 20
> newsgroups data set, profiling reveals that 92% of the runtime was spent
> updating assigning the vectors).
> Here's a prototype patch:
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira