Re: spark1.0 principal component analysis

2014-10-16 Thread al123
Hi,

I don't think anybody answered this question...


fintis wrote
 How do I match the principal components to the actual features since there
 is some sorting? 

Would anybody be able to shed a little light on it since I too am struggling
with this?

Many thanks!!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p16556.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark1.0 principal component analysis

2014-10-16 Thread Xiangrui Meng
computePrincipalComponents returns a local matrix X, whose columns are
the principal components (ordered), while those column vectors are in
the same feature space as the input feature vectors. -Xiangrui

On Thu, Oct 16, 2014 at 2:39 AM, al123 ant.lay...@hotmail.co.uk wrote:
 Hi,

 I don't think anybody answered this question...


 fintis wrote
 How do I match the principal components to the actual features since there
 is some sorting?

 Would anybody be able to shed a little light on it since I too am struggling
 with this?

 Many thanks!!



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p16556.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark1.0 principal component analysis

2014-09-23 Thread st553
sowen wrote
 it seems that the singular values from the SVD aren't returned, so I don't
 know that you can access this directly

Its not clear to me why these aren't returned? The S matrix would be useful
to determine a reasonable value for K.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p14919.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark1.0 principal component analysis

2014-09-23 Thread Evan R. Sparks
In its current implementation, the principal components are computed in
MLlib in two steps:
1) In a distributed fashion, compute the covariance matrix - the result is
a local matrix.
2) On this local matrix, compute the SVD.

The sorting comes from the SVD. If you want to get the eigenvalues out, you
can simply run step 1 yourself on your RowMatrix via the (experimental)
computeCovariance() method, and then run SVD on the result using a library
like breeze.

- Evan



On Tue, Sep 23, 2014 at 12:49 PM, st553 sthompson...@gmail.com wrote:

 sowen wrote
  it seems that the singular values from the SVD aren't returned, so I
 don't
  know that you can access this directly

 Its not clear to me why these aren't returned? The S matrix would be useful
 to determine a reasonable value for K.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249p14919.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: spark1.0 principal component analysis

2014-07-10 Thread Sean Owen
To clarify, you are looking for eigenvectors of what, the covariance
matrix? So for example you are looking for the sqrt of the eigenvalues when
you talk about stdev of components?

Looking at
https://github.com/apache/spark/blob/1f33e1f2013c508aa86511750f7bd8437154e51a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L462
it seems that the singular values from the SVD aren't returned, so I don't
know that you can access this directly.

You could emulate this approach directly though in your own code to access
them.

But the output is pretty straightforward, it's the principal components as
columns. If you have m rows of n-dimensional data, and ask for k principal
components, you get an n x k matrix, where the k columns are the
n-dimensional principal component vectors.


On Thu, Jul 10, 2014 at 1:46 AM, fintis fin...@gmail.com wrote:

 Hi,

 Can anyone please shed more light on the PCA  implementation in spark? The
 documentation is a bit leaving as I am not sure I understand the output.
 According to the docs, the output is a local matrix with the columns as
 principal components and columns sorted in descending order of covariance.
 This is a bit confusing for me as I need to compute other statistic Like
 standard deviation of the principal components. How do I match the
 principal
 components to the actual features since there is some sorting? How about
 eigenvectors and eigenvalues?

 Please anyone to help shed light on the output, how to use it further and
 pca spark implementation in general is appreciated

 Thank you in earnest



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark1-0-principal-component-analysis-tp9249.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.