[ 
https://issues.apache.org/jira/browse/MADLIB-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1001:
------------------------------------
    Description: 
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.

Details 

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   time_stamp,
   max_time,
   output_cols -- new
   create_view   -- new
   )
{code}
where

output_cols (optional)
        TEXT.  Controls which columns are output.  If NULL (default), only the 
partition, time stamp and the generated session ID columns are output.  (The 
assumption is that the partition columns together with the time stamp column 
will be sufficient to perform a join with the input table.)  If '*', all 
columns from the source table are output in addition to the generated session 
ID.  Otherwise user can provide a specific list of columns of interest to 
output 'x, y, z, ...'

create_view (optional)
        BOOLEAN default: TRUE. Determines whether to create a view or 
materialize a table as output. If you only needed session info once, creating a 
view could be significantly faster than materializing as a table.

  was:
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.

Details 

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   order_expr,
   time_stamp,
   time_out,
   output_cols -- new
   create_view   -- new
   )
{code}
where

output_cols (optional)
        TEXT.  Controls which columns are output.  If NULL (default), only the 
partition, time stamp and the generated session ID columns are output.  (The 
assumption is that the partition columns together with the time stamp column 
will be sufficient to perform a join with the input table.)  If '*', all 
columns from the source table are output in addition to the generated session 
ID.  Otherwise user can provide a specific list of columns of interest to 
output 'x, y, z, ...'

create_view (optional)
        BOOLEAN default: TRUE. Determines whether to create a view or 
materialize a table as output. If you only needed session info once, creating a 
view could be significantly faster than materializing as a table.


> Sessionization - Phase 2 (output controls)
> ------------------------------------------
>
>                 Key: MADLIB-1001
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1001
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Nandish Jayaram
>            Priority: Minor
>              Labels: gsoc2016, starter
>             Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set, 
> so that I can prepare for input into other algorithms like path functions, or 
> predictive analytics algorithms.
> This is a follow on to 
> https://issues.apache.org/jira/browse/MADLIB-909
> to add optional output controls.
> Details 
> Proposed interface changes:
> {code}
> sessionize (
>    source_table,
>    output_table,
>    partition_expr,
>    time_stamp,
>    max_time,
>    output_cols -- new
>    create_view   -- new
>    )
> {code}
> where
> output_cols (optional)
>         TEXT.  Controls which columns are output.  If NULL (default), only 
> the partition, time stamp and the generated session ID columns are output.  
> (The assumption is that the partition columns together with the time stamp 
> column will be sufficient to perform a join with the input table.)  If '*', 
> all columns from the source table are output in addition to the generated 
> session ID.  Otherwise user can provide a specific list of columns of 
> interest to output 'x, y, z, ...'
> create_view (optional)
>       BOOLEAN default: TRUE. Determines whether to create a view or 
> materialize a table as output. If you only needed session info once, creating 
> a view could be significantly faster than materializing as a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to