[ 
https://issues.apache.org/jira/browse/MADLIB-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1002:
------------------------------------
    Description: 
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001
to add minimum time.

Details 

Add min time to the existing params:

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   order_expr,
   time_stamp,
   time_out,
   min_time,   -- new
   output_cols,
   create_view 
   )
{code}
where

min_time (optional)
Minimum delta time that must elapse for an event to be considered a valid event 
(default=0).  If an event happens in less than min_time since the last valid 
event, it does not get included in the current session and is dropped.   Same 
units as time_stamp.

Implementation notes

1) Should be specified in the same units as the time_out parameter.   
2) Always compare against the last valid session event, not against one(s) that 
just got dropped.

For an example of how min_time could work, see Aster Analytics sessionization 
function [1].

References

[1] Aster Analytics users guide, see "sessionize" function
http://www.info.teradata.com/edownload.cfm?itemid=143450001
http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
https://www.youtube.com/watch?v=C760M9ttK9Q

  was:
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001
to add minimum time.

Details 

Add min time to the existing params:

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   order_expr,
   time_stamp,
   time_out,
   min_time,   -- new
   output_all_cols,
   create_view 
   )
{code}
where

min_time (optional)
Minimum delta time that must elapse for an event to be considered a valid event 
(default=0).  If an event happens in less than min_time since the last valid 
event, it does not get included in the current session and is dropped.   Same 
units as time_stamp.

Implementation notes

1) Should be specified in the same units as the time_out parameter.   
2) Always compare against the last valid session event, not against one(s) that 
just got dropped.

For an example of how min_time could work, see Aster Analytics sessionization 
function [1].

References

[1] Aster Analytics users guide, see "sessionize" function
http://www.info.teradata.com/edownload.cfm?itemid=143450001
http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
https://www.youtube.com/watch?v=C760M9ttK9Q


> Sessionization - Phase 3 (minimum time)
> ---------------------------------------
>
>                 Key: MADLIB-1002
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1002
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Minor
>              Labels: gsoc2016, starter
>             Fix For: v1.9.2
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set, 
> so that I can prepare for input into other algorithms like path functions, or 
> predictive analytics algorithms.
> This is a follow on to 
> https://issues.apache.org/jira/browse/MADLIB-909
> https://issues.apache.org/jira/browse/MADLIB-1001
> to add minimum time.
> Details 
> Add min time to the existing params:
> Proposed interface changes:
> {code}
> sessionize (
>    source_table,
>    output_table,
>    partition_expr,
>    order_expr,
>    time_stamp,
>    time_out,
>    min_time,   -- new
>    output_cols,
>    create_view 
>    )
> {code}
> where
> min_time (optional)
> Minimum delta time that must elapse for an event to be considered a valid 
> event (default=0).  If an event happens in less than min_time since the last 
> valid event, it does not get included in the current session and is dropped.  
>  Same units as time_stamp.
> Implementation notes
> 1) Should be specified in the same units as the time_out parameter.   
> 2) Always compare against the last valid session event, not against one(s) 
> that just got dropped.
> For an example of how min_time could work, see Aster Analytics sessionization 
> function [1].
> References
> [1] Aster Analytics users guide, see "sessionize" function
> http://www.info.teradata.com/edownload.cfm?itemid=143450001
> http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
> https://www.youtube.com/watch?v=C760M9ttK9Q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to