[ 
https://issues.apache.org/jira/browse/MADLIB-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150925#comment-15150925
 ] 

ASF GitHub Bot commented on MADLIB-948:
---------------------------------------

GitHub user orhankislal opened a pull request:

    https://github.com/apache/incubator-madlib/pull/17

    PCA: Proportion of variance for PCA training function

    JIRA: MADLIB-948
    - Added a new functionality where the user can specify the proportion of 
variance to be covered by the principal components. This function does not take 
an integer k value, instead a float value (between 0 and 1) is accepted.
    - The interface has been updated with new parameter names to reflect the 
change.
    - The sparse and block variants of PCA are updated to employ this 
functionality.
    - The proportion of variance covered by each principal component is added 
to the output for the new function as well as the old one.
    - The implementation required splitting the SVD function into two parts and 
applying various levels of wrappers so that the general SVD interface does not 
change while giving PCA enough access to manipulate the intermediate tables.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/orhankislal/incubator-madlib func/pca_prop

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17
    
----
commit efa96cb43344aed29815dbcfb93d10be212f1a7e
Author: Orhan Kislal <[email protected]>
Date:   2016-02-17T18:11:52Z

    PCA: Proportion of variance for PCA training function
    
    JIRA: MADLIB-948
    - Added a new functionality where the user can specify the proportion of 
variance to be covered by the principal components. This function does not take 
an integer k value, instead a float value (between 0 and 1) is accepted.
    - The interface has been updated with new parameter names to reflect the 
change.
    - The sparse and block variants of PCA are updated to employ this 
functionality.
    - The proportion of variance covered by each principal component is added 
to the output for the new function as well as the old one.
    - The implementation required splitting the SVD function into two parts and 
applying various levels of wrappers so that the general SVD interface does not 
change while giving PCA enough access to manipulate the intermediate tables.

----


> Proportion of variance for PCA training function
> ------------------------------------------------
>
>                 Key: MADLIB-948
>                 URL: https://issues.apache.org/jira/browse/MADLIB-948
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v2.0
>
>
> In future iterations of the pca_train command, is it feasible to insert 
> another optional command called variance_proportion? Instead of specifying k 
> principal components to compute, you instead specify the proportion of 
> variance that you want your PCA vectors to account for. The number of 
> principal vectors generated would depend the covariance matrix/correlation 
> matrix (depending on whether you normalized or not) and variance_proportion. 
> So if I specified that variance_proportion = .8, the algorithm would 
> terminate after obtaining enough principal vectors so that the ratio of the 
> sum of the eigenvalues collected thus far to the trace of the covariance 
> matrix/correlation matrix (the sum of all of the eigenvalues of the 
> covariance matrix/correlation matrix) is greater than or equal to .8. That 
> is, the algorithm would terminate after collecting enough vectors to account 
> for 80% of the total variance in the set of observations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to