[jira] [Comment Edited] (MADLIB-1171) Support model versioning in output tables

Frank McQuillan (JIRA) Wed, 01 Nov 2017 14:57:55 -0700

    [ 
https://issues.apache.org/jira/browse/MADLIB-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234806#comment-16234806
 ]


Frank McQuillan edited comment on MADLIB-1171 at 11/1/17 9:56 PM:
------------------------------------------------------------------

Other:

If model output table requires unique index storage attributes, this should be 
specified in the code that creates the table.
Some users use append-optimized compressed tables by default which doesn't work 
with most of MADlib functions.

{code}
gp 09:39:02=> SELECT madlib.logregr_train(
gp(>     'patients',                                 -- source table
gp(>     'patients_logregr',                         -- output table
gp(>     'second_attack',                            -- labels
gp(>     'ARRAY[1, treatment, trait_anxiety]',       -- features
gp(>     NULL,                                       -- grouping columns
gp(>     20,                                         -- max number of iteration
gp(>     'irls'                                      -- optimizer
gp(>     );
ERROR:  plpy.SPIError: append-only tables do not support unique indexes 
(plpython.c:4656)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "logregr_train", line 19, in <module>
    return logistic.logregr_train(**globals())
  PL/Python function "logregr_train", line 133, in logregr_train
  PL/Python function "logregr_train", line 260, in __logregr_train_compute
  PL/Python function "logregr_train", line 75, in __compute_logregr
  PL/Python function "logregr_train", line 127, in __enter__
  PL/Python function "logregr_train", line 197, in runSQL
PL/Python function "logregr_train"
{code}
 
We don't want to force users to reset storage parameters every time he/she 
needs to call MADlib function
 
ALTER ROLE xxx SET gp_default_storage_options 'appendonly=false' ;


was (Author: fmcquillan):
Other:

If model output table requires unique index storage attributes, this should be 
specified in the code that creates the table.
Some users use append-optimized compressed tables by default which doesn't work 
with most of MADlib functions.
 
gprdsu 09:39:02=> SELECT madlib.logregr_train(
gprdsu(>     'patients',                                 -- source table
gprdsu(>     'patients_logregr',                         -- output table
gprdsu(>     'second_attack',                            -- labels
gprdsu(>     'ARRAY[1, treatment, trait_anxiety]',       -- features
gprdsu(>     NULL,                                       -- grouping columns
gprdsu(>     20,                                         -- max number of 
iteration
gprdsu(>     'irls'                                      -- optimizer
gprdsu(>     );
ERROR:  plpy.SPIError: append-only tables do not support unique indexes 
(plpython.c:4656)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "logregr_train", line 19, in <module>
    return logistic.logregr_train(**globals())
  PL/Python function "logregr_train", line 133, in logregr_train
  PL/Python function "logregr_train", line 260, in __logregr_train_compute
  PL/Python function "logregr_train", line 75, in __compute_logregr
  PL/Python function "logregr_train", line 127, in __enter__
  PL/Python function "logregr_train", line 197, in runSQL
PL/Python function "logregr_train"
 
We don't want to force users to reset storage parameters every time he/she 
needs to call MADlib function
 
ALTER ROLE xxx SET gp_default_storage_options 'appendonly=false' ;

> Support model versioning in output tables
> -----------------------------------------
>
>                 Key: MADLIB-1171
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1171
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: All Modules
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v2.0
>
>         Attachments: p100.png, p101.png
>
>
> Context
> For many MADlib modules,  <out_table> contains the separate models for each 
> group and <out_table>_summary contains the common model data for all groups.  
> Modeling versioning can be awkward since the model output table and model 
> summary table need to be explicitly dropped between runs.
> Story
> As a data scientist, I want to perform multiple runs without having to drop 
> tables, so that I can easily get a history of the model runs with clear 
> versioning.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (MADLIB-1171) Support model versioning in output tables

Reply via email to