[ 
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628373#comment-13628373
 ] 

Ted Dunning commented on MAHOUT-1190:
-------------------------------------

SASV is screaming fast for dot product based programs because dots on large 
vectors turn into very cache friendly merges.  The difference versus RASV's can 
literally be 10x or more due to this cache effect.

I think it is valuable.
                
> SequentialAccessSparseVector function assignment is very slow
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1190
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1190
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Dan Filimon
>
> Currently when calling .assign() on a SASV with another vector and a custom 
> function, it will iterate through it and assign every single entry while also 
> referring it by index.
> This makes the process *hugely* expensive. (on a run of BallKMeans on the 20 
> newsgroups data set, profiling reveals that 92% of the runtime was spent 
> updating assigning the vectors).
> Here's a prototype patch:
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to