sparse sample

DB Tsai (JIRA) Tue, 02 Dec 2014 17:21:23 -0800

DB Tsai created SPARK-4708:
------------------------------

             Summary: k-mean runs two/three times faster with dense/sparse 
sample
                 Key: SPARK-4708
                 URL: https://issues.apache.org/jira/browse/SPARK-4708
             Project: Spark
          Issue Type: Improvement
            Reporter: DB Tsai



Note that the usage of `breezeSquaredDistance` in 
`org.apache.spark.mllib.util.MLUtils.fastSquaredDistance` is in the critical 
path, and breezeSquaredDistance is slow. We should replace it with our own 
implementation.

Here is the benchmark against mnist8m dataset.

Before
DenseVector: 70.04secs
SparseVector: 59.05secs

With this PR
DenseVector: 30.58secs
SparseVector: 21.14secs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-4708) k-mean runs two/three times faster with dense/sparse sample

Reply via email to