[
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631775#comment-13631775
]
Robin Anil commented on MAHOUT-1190:
------------------------------------
I am comparing homogenous benchmarks Dense v/s Dense, Seq vs Seq. To iterate
faster I disabled the Dense.fn(Rand)... etc in code. Also I disabled the
clustering benchmarks.
If you have a patch ready please upload it and I can use my machine so as to
compare the improvement.
Dan, can you help me out by looking at those clustering tests and Pearson are
failing due to me fixing the iterator ? I am thinking the expectations are
wrong but I dont know enough about those tests.
> SequentialAccessSparseVector function assignment is very slow
> -------------------------------------------------------------
>
> Key: MAHOUT-1190
> URL: https://issues.apache.org/jira/browse/MAHOUT-1190
> Project: Mahout
> Issue Type: Bug
> Reporter: Dan Filimon
> Attachments: MAHOUT-1190-1.patch, MAHOUT-1190-iterator-fix.patch,
> MAHOUT-1190-iterator-fix.patch, MAHOUT-1190-iterator-fix.patch,
> MAHOUT-1190.patch, MAHOUT-1190-seq-dot-product.patch,
> MAHOUT-1190-seq-dot-product.patch
>
>
> Currently when calling .assign() on a SASV with another vector and a custom
> function, it will iterate through it and assign every single entry while also
> referring it by index.
> This makes the process *hugely* expensive. (on a run of BallKMeans on the 20
> newsgroups data set, profiling reveals that 92% of the runtime was spent
> updating assigning the vectors).
> Here's a prototype patch:
> https://github.com/dfilimon/mahout/commit/63998d82bb750150a6ae09052dadf6c326c62d3d
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira