[ 
https://issues.apache.org/jira/browse/MADLIB-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1001:
------------------------------------
    Description: 
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.

Details 

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   time_stamp,
   max_time,
   output_cols -- new
   create_view   -- new
   )
{code}
where

output_cols (optional)
        TEXT.  
 asterisk (i.e., '*') -- ALL columns in input table + session column (default)
'x, y, z, ...' -- list of columns you want + session column.  This list could 
include the partition expression or other expressions as desired.  This should 
also support '*, expr1, expr2, etc.' where this means output all columns + the 
extra expressions listed.  Needs to a valid SELECT expression.

For example, in the path function 
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html#examples
we do a similar thing for the aggregate function parameter.

create_view (optional)
        BOOLEAN default: TRUE. Determines whether to create a view or 
materialize a table as output. If you only needed session info once, creating a 
view could be significantly faster than materializing as a table.

  was:
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.

Details 

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   time_stamp,
   max_time,
   output_cols -- new
   create_view   -- new
   )
{code}
where

output_cols (optional)
        TEXT.  
 asterisk (i.e., '*') -- ALL columns in input table + session column (default)
'x, y, z, ...' -- list of columns you want + session column.  This list could 
include the partition expression or other expressions as desired.  This should 
also support '*, expr1, expr2, etc.' where this means output all columns + the 
extra expressions listed.

create_view (optional)
        BOOLEAN default: TRUE. Determines whether to create a view or 
materialize a table as output. If you only needed session info once, creating a 
view could be significantly faster than materializing as a table.


> Sessionization - Phase 2 (output controls)
> ------------------------------------------
>
>                 Key: MADLIB-1001
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1001
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Nandish Jayaram
>            Priority: Minor
>              Labels: gsoc2016, starter
>             Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set, 
> so that I can prepare for input into other algorithms like path functions, or 
> predictive analytics algorithms.
> This is a follow on to 
> https://issues.apache.org/jira/browse/MADLIB-909
> to add optional output controls.
> Details 
> Proposed interface changes:
> {code}
> sessionize (
>    source_table,
>    output_table,
>    partition_expr,
>    time_stamp,
>    max_time,
>    output_cols -- new
>    create_view   -- new
>    )
> {code}
> where
> output_cols (optional)
>         TEXT.  
>  asterisk (i.e., '*') -- ALL columns in input table + session column (default)
> 'x, y, z, ...' -- list of columns you want + session column.  This list could 
> include the partition expression or other expressions as desired.  This 
> should also support '*, expr1, expr2, etc.' where this means output all 
> columns + the extra expressions listed.  Needs to a valid SELECT expression.
> For example, in the path function 
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html#examples
> we do a similar thing for the aggregate function parameter.
> create_view (optional)
>       BOOLEAN default: TRUE. Determines whether to create a view or 
> materialize a table as output. If you only needed session info once, creating 
> a view could be significantly faster than materializing as a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to