[jira] [Commented] (SPARK-7856) Scalable PCA implementation for tall and fat matrices

Hayri Volkan Agun (JIRA) Sat, 08 Apr 2017 03:24:58 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961768#comment-15961768
 ]


Hayri Volkan Agun commented on SPARK-7856:
------------------------------------------

Hi Tarek,

Still on the issue of Probabilistic PCA. It would be very useful if there is an 
implementation based on the number of principal components. 

> Scalable PCA implementation for tall and fat matrices
> -----------------------------------------------------
>
>                 Key: SPARK-7856
>                 URL: https://issues.apache.org/jira/browse/SPARK-7856
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Tarek Elgamal
>
> Currently the PCA implementation has a limitation of fitting d^2 
> covariance/grammian matrix entries in memory (d is the number of 
> columns/dimensions of the matrix). We often need only the largest k principal 
> components. To make pca really scalable, I suggest an implementation where 
> the memory usage is proportional to the principal components k rather than 
> the full dimensionality d. 
> I suggest adopting the solution described in this paper that is published in 
> SIGMOD 2015 (http://ds.qcri.org/images/profile/tarek_elgamal/sigmod2015.pdf). 
> The paper offers an implementation for Probabilistic PCA (PPCA) which has 
> less memory and time complexity and could potentially scale to tall and fat 
> matrices rather than tall and skinny matrices that is supported by the 
> current PCA impelmentation. 
> Probablistic PCA could be potentially added to the set of algorithms 
> supported by MLlib and it does not necessarily replace the old PCA 
> implementation.
> PPCA implementation is adopted in Matlab's Statistics and Machine Learning 
> Toolbox (http://www.mathworks.com/help/stats/ppca.html)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-7856) Scalable PCA implementation for tall and fat matrices

Reply via email to