Vincent created SPARK-21058:
-------------------------------

             Summary: potential SVD optimization
                 Key: SPARK-21058
                 URL: https://issues.apache.org/jira/browse/SPARK-21058
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
    Affects Versions: 2.1.1
            Reporter: Vincent


In the current implementation, computeSVD will compute SVD for matrix A by 
computing AT*A first and svd on the Gramian matrix, we found that the Gramian 
matrix computation is the hot spot of the overall SVD computation. While svd on 
the Gramian matrix can benefit svd computation on the skinny matrix, for a 
non-skinny matrix, it could also become a huge overhead. So, is it possible to 
offer another option by computing svd on the original matrix instead of the 
Gramian matrix? We can decide which way to go by the ratio between numCols and 
numRows, or by simply settings from the user.
We have observed a handsome gain on a toy dataset by svd on the original matrix 
instead of the Gramian matrix, if the proposal is acceptable, we will start to 
work on the patch and gather more performance data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to