[ 
https://issues.apache.org/jira/browse/MADLIB-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684648#comment-15684648
 ] 

Frank McQuillan commented on MADLIB-947:
----------------------------------------


1) Interface

{code}
grouping_col (optional)
TEXT, default: NULL. An expression list used to group the input dataset into 
discrete groups, running one model per group. Similar to the SQL "GROUP BY" 
clause. When this value is NULL, no grouping is used and a single model is 
generated.
{code)

IF dense 1

* specified: row_id and grouping_cols
* inferred:  row_vec is a numeric array defining the matrix values (to be cast 
to FLOAT8[])
* errors: if more than 1 numeric array in input table then throw error
* ignore any other columns that do not affect above logic

IF dense 2

* specified: row_id and grouping_cols
* inferred:  numeric columns become matrix values (to be cast each to FLOAT8)
* ignore any other columns that do not affect above logic

IF sparse

* specified:  everything that is needed - row_id, col_id, val_id, grouping_cols
* ignore any other columns that do not affect above logic


2) Performance

Please use the group iteration controller so we get query processor powered 
efficiency, rather than doing grouping in a straight for-loop which would be 
slow.




> Support grouping for PCA
> ------------------------
>
>                 Key: MADLIB-947
>                 URL: https://issues.apache.org/jira/browse/MADLIB-947
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Frank McQuillan
>             Fix For: v1.10
>
>
> Implement grouping support in PCA
> http://doc.madlib.net/latest/group__grp__pca__train.html#train
> http://doc.madlib.net/latest/group__grp__pca__train.html#train



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to