DB Tsai created SPARK-4708:
------------------------------

             Summary: k-mean runs two/three times faster with dense/sparse 
sample
                 Key: SPARK-4708
                 URL: https://issues.apache.org/jira/browse/SPARK-4708
             Project: Spark
          Issue Type: Improvement
            Reporter: DB Tsai


Note that the usage of `breezeSquaredDistance` in 
`org.apache.spark.mllib.util.MLUtils.fastSquaredDistance` is in the critical 
path, and breezeSquaredDistance is slow. We should replace it with our own 
implementation.

Here is the benchmark against mnist8m dataset.

Before
DenseVector: 70.04secs
SparseVector: 59.05secs

With this PR
DenseVector: 30.58secs
SparseVector: 21.14secs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to