[
https://issues.apache.org/jira/browse/SPARK-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213826#comment-14213826
]
Apache Spark commented on SPARK-4431:
-------------------------------------
User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/3288
> Implement efficient activeIterator for dense and sparse vector
> --------------------------------------------------------------
>
> Key: SPARK-4431
> URL: https://issues.apache.org/jira/browse/SPARK-4431
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: DB Tsai
>
> Previously, we were using Breeze's activeIterator to access the non-zero
> elements in sparse vector, and explicitly skipping the zero in dense/sparse
> vector using pattern matching. Due to the overhead, we switched back to
> native `while loop` in #SPARK-4129.
> However, #SPARK-4129 requires de-reference the dv.values/sv.values in each
> access to the value, and the zeros in dense vector and sparse vector if exist
> are skipped in the add function call; the overall penalty will be around 10%
> compared with de-reference once outside the while block, and checking if zero
> before calling the add function. The code is branched out for dense and
> sparse vector, and it's not easy to maintain in the long term.
> Not only this activeIterator implementation increases the performance, but
> the abstraction of accessing the non-zero elements in different vector type
> also helps the maintainability of codebase. In this PR, only
> MultivariateOnlineSummarizer uses new API as example, and others can be
> migrated to activeIterator later.
> Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded
> in memory, and repeating 5000 times.
> Before change:
> Sparse Vector - 30.02
> Dense Vector - 38.27
> After this optimization:
> Sparse Vector - 27.54
> Dense Vector - 35.13
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]