[
https://issues.apache.org/jira/browse/MADLIB-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786369#comment-15786369
]
ASF GitHub Bot commented on MADLIB-947:
---------------------------------------
GitHub user njayaram2 opened a pull request:
https://github.com/apache/incubator-madlib/pull/84
PCA: Add grouping support to PCA
JIRA: MADLIB-947
- PCA can now handle grouping columns. pca_train() with grouping_cols
parameter specified learns an independent model for each group in
the input table. New columns corresponding to the columns specified
in grouping_cols will be created in the output, mean and summary
tables.
- If pca_project() is called on an input table that has grouping_cols
in it, the pc_table used in the parameter list must be a PCA model
table that is learnt with grouping_cols. If the input table for
pca_project() has grouping columns but the pc_table used does not
support grouping_cols, or vice versa, there will be an error thrown.
- Another important new feature is that the 'row_id' column in the
input tables always had to be serially increasing, starting from 1. That
requirement is now relaxed since this commit converts given 'row_id' to
a new column that follows the rules laid out by sparse and dense
matrix formats.
- Both the online and user docs are improved with more examples.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/njayaram2/incubator-madlib
features/pca-grouping-simple
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/84.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #84
----
commit cfdddb490695782a38a56aa0c9a635c063fd916b
Author: Nandish Jayaram <[email protected]>
Date: 2016-12-21T22:18:38Z
PCA: Add grouping support to PCA
JIRA: MADLIB-947
- PCA can now handle grouping columns. pca_train() with grouping_cols
parameter specified learns an independent model for each group in
the input table. New columns corresponding to the columns specified
in grouping_cols will be created in the output, mean and summary
tables.
- If pca_project() is called on an input table that has grouping_cols
in it, the pc_table used in the parameter list must be a PCA model
table that is learnt with grouping_cols. If the input table for
pca_project() has grouping columns but the pc_table used does not
support grouping_cols, or vice versa, there will be an error thrown.
- Another important new feature is that the 'row_id' column in the
input tables always had to be serially increasing, starting from 1. That
requirement is now relaxed since this commit converts given 'row_id' to
a new column that follows the rules laid out by sparse and dense
matrix formats.
- Both the online and user docs are improved with more examples.
----
> Support grouping for PCA
> ------------------------
>
> Key: MADLIB-947
> URL: https://issues.apache.org/jira/browse/MADLIB-947
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Frank McQuillan
> Assignee: Nandish Jayaram
> Fix For: v1.10
>
>
> Implement grouping support in PCA
> http://doc.madlib.net/latest/group__grp__pca__train.html#train
> http://doc.madlib.net/latest/group__grp__pca__train.html#train
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)