Frank McQuillan created MADLIB-948:
--------------------------------------

             Summary: Proportion of variance for PCA training function
                 Key: MADLIB-948
                 URL: https://issues.apache.org/jira/browse/MADLIB-948
             Project: Apache MADlib
          Issue Type: New Feature
            Reporter: Frank McQuillan


In future iterations of the pca_train command, is it feasible to insert another 
optional command called variance_proportion? Instead of specifying k principal 
components to compute, you instead specify the proportion of variance that you 
want your PCA vectors to account for. The number of principal vectors generated 
would depend the covariance matrix/correlation matrix (depending on whether you 
normalized or not) and variance_proportion. So if I specified that 
variance_proportion = .8, the algorithm would terminate after obtaining enough 
principal vectors so that the ratio of the sum of the eigenvalues collected 
thus far to the trace of the covariance matrix/correlation matrix (the sum of 
all of the eigenvalues of the covariance matrix/correlation matrix) is greater 
than or equal to .8. That is, the algorithm would terminate after collecting 
enough vectors to account for 80% of the total variance in the set of 
observations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to