[ 
https://issues.apache.org/jira/browse/MADLIB-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163525#comment-15163525
 ] 

ASF GitHub Bot commented on MADLIB-948:
---------------------------------------

GitHub user orhankislal opened a pull request:

    https://github.com/apache/incubator-madlib/pull/24

    PCA: Proportion of variance for PCA training function

    JIRA: MADLIB-948
    Minor fixes:
    -Added online help for pca_train and pca_sparse_train
    -Unified error messages for clarity
    -Fixed bug with a variance border case(1.0)
    -Fixed docs to reflect correct mean table/column name
    -Fixed docs to reflect the allowed ranges for components_param

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/orhankislal/incubator-madlib func/pca_prop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #24
    
----
commit ba7db1c5fa70a9b5ffd06e178cb892f907c85d77
Author: Orhan Kislal <[email protected]>
Date:   2016-02-24T00:26:25Z

    PCA: Proportion of variance for PCA training function
    
    JIRA: MADLIB-948
    Minor fixes:
    -Added online help for pca_train and pca_sparse_train
    -Unified error messages for clarity
    -Fixed bug with a variance border case(1.0)
    -Fixed docs to reflect correct mean table/column name
    -Fixed docs to reflect the allowed ranges for components_param

----


> Proportion of variance for PCA training function
> ------------------------------------------------
>
>                 Key: MADLIB-948
>                 URL: https://issues.apache.org/jira/browse/MADLIB-948
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v2.0
>
>
> In future iterations of the pca_train command, is it feasible to insert 
> another optional command called variance_proportion? Instead of specifying k 
> principal components to compute, you instead specify the proportion of 
> variance that you want your PCA vectors to account for. The number of 
> principal vectors generated would depend the covariance matrix/correlation 
> matrix (depending on whether you normalized or not) and variance_proportion. 
> So if I specified that variance_proportion = .8, the algorithm would 
> terminate after obtaining enough principal vectors so that the ratio of the 
> sum of the eigenvalues collected thus far to the trace of the covariance 
> matrix/correlation matrix (the sum of all of the eigenvalues of the 
> covariance matrix/correlation matrix) is greater than or equal to .8. That 
> is, the algorithm would terminate after collecting enough vectors to account 
> for 80% of the total variance in the set of observations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to