Re: spark1.0 principal component analysis
Hi, I don't think anybody answered this question... fintis wrote How do I match the principal components to the actual features since there is some sorting? Would anybody be able to shed a little light on it since I too am struggling with this? Many thanks!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p16556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark1.0 principal component analysis
computePrincipalComponents returns a local matrix X, whose columns are the principal components (ordered), while those column vectors are in the same feature space as the input feature vectors. -Xiangrui On Thu, Oct 16, 2014 at 2:39 AM, al123 ant.lay...@hotmail.co.uk wrote: Hi, I don't think anybody answered this question... fintis wrote How do I match the principal components to the actual features since there is some sorting? Would anybody be able to shed a little light on it since I too am struggling with this? Many thanks!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p16556.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark1.0 principal component analysis
sowen wrote it seems that the singular values from the SVD aren't returned, so I don't know that you can access this directly Its not clear to me why these aren't returned? The S matrix would be useful to determine a reasonable value for K. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p14919.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark1.0 principal component analysis
In its current implementation, the principal components are computed in MLlib in two steps: 1) In a distributed fashion, compute the covariance matrix - the result is a local matrix. 2) On this local matrix, compute the SVD. The sorting comes from the SVD. If you want to get the eigenvalues out, you can simply run step 1 yourself on your RowMatrix via the (experimental) computeCovariance() method, and then run SVD on the result using a library like breeze. - Evan On Tue, Sep 23, 2014 at 12:49 PM, st553 sthompson...@gmail.com wrote: sowen wrote it seems that the singular values from the SVD aren't returned, so I don't know that you can access this directly Its not clear to me why these aren't returned? The S matrix would be useful to determine a reasonable value for K. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p14919.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark1.0 principal component analysis
To clarify, you are looking for eigenvectors of what, the covariance matrix? So for example you are looking for the sqrt of the eigenvalues when you talk about stdev of components? Looking at https://github.com/apache/spark/blob/1f33e1f2013c508aa86511750f7bd8437154e51a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L462 it seems that the singular values from the SVD aren't returned, so I don't know that you can access this directly. You could emulate this approach directly though in your own code to access them. But the output is pretty straightforward, it's the principal components as columns. If you have m rows of n-dimensional data, and ask for k principal components, you get an n x k matrix, where the k columns are the n-dimensional principal component vectors. On Thu, Jul 10, 2014 at 1:46 AM, fintis fin...@gmail.com wrote: Hi, Can anyone please shed more light on the PCA implementation in spark? The documentation is a bit leaving as I am not sure I understand the output. According to the docs, the output is a local matrix with the columns as principal components and columns sorted in descending order of covariance. This is a bit confusing for me as I need to compute other statistic Like standard deviation of the principal components. How do I match the principal components to the actual features since there is some sorting? How about eigenvectors and eigenvalues? Please anyone to help shed light on the output, how to use it further and pca spark implementation in general is appreciated Thank you in earnest -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249.html Sent from the Apache Spark User List mailing list archive at Nabble.com.