[
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629031#comment-13629031
]
Jake Mannix commented on MAHOUT-1190:
-------------------------------------
"Sequential access is a slow format. "
?!?!
It was designed for speed, not memory, but for some very specific (and common)
uses: The point is that it should a) be only used immutably, and b) should be
used for things which require primarily iteration over the whole nonzero range
(dot, plus, minus, cross, norm, etc).
> SequentialAccessSparseVector function assignment is very slow
> -------------------------------------------------------------
>
> Key: MAHOUT-1190
> URL: https://issues.apache.org/jira/browse/MAHOUT-1190
> Project: Mahout
> Issue Type: Bug
> Reporter: Dan Filimon
>
> Currently when calling .assign() on a SASV with another vector and a custom
> function, it will iterate through it and assign every single entry while also
> referring it by index.
> This makes the process *hugely* expensive. (on a run of BallKMeans on the 20
> newsgroups data set, profiling reveals that 92% of the runtime was spent
> updating assigning the vectors).
> Here's a prototype patch:
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira