[ 
https://issues.apache.org/jira/browse/MADLIB-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1002:
------------------------------------
    Description: 
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001
to add minimum time.

Details 

Add min time to the existing params:

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   order_expr,
   time_stamp,
   time_out,
   min_time   -- new
   output_all_cols,
   create_view 
   )
{code}
where

min_time (optional)
Minimum delta time that must elapse for an event to be considered a valid event 
(default=0).  If an event happens in less than min_time since the last valid 
event, it does not get included in the current session and is dropped.   Same 
units as time_stamp.

Implementation notes

1) Should be specified in the same units as the time_out parameter.   
2) Always compare against the last valid session event, not against one(s) that 
just got dropped.

For an example of how min_time could work, see Aster Analytics sessionization 
function [1].

References

[1] Aster Analytics users guide, see "sessionize" function
http://www.info.teradata.com/edownload.cfm?itemid=143450001
http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
https://www.youtube.com/watch?v=C760M9ttK9Q

  was:
Story

As a data scientist, I want to perform session reconstruction on my data set, 
so that I can prepare for input into other algorithms like path functions, or 
predictive analytics algorithms.

This is a follow on to 
https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001
to add minimum time.

Details 

Add min time to the existing params:

Proposed interface changes:

{code}
sessionize (
   source_table,
   output_table,
   partition_expr,
   order_expr,
   time_stamp,
   time_out,
   min_time   -- new
   output_all_cols,
   create_view 
   )
{code}
where

min_time (optional)
FLOAT   Minimum delta time that must elapse for an event to be considered a 
valid event (default=0).  If an event happens in less than min_time since the 
last valid event, it does not get included in the current session and is 
dropped. 

Implementation notes

1) Should be specified in the same units as the time_out parameter.   
2) Always compare against the last valid session event, not against one(s) that 
just got dropped.

For an example of how min_time could work, see Aster Analytics sessionization 
function [1].

References

[1] Aster Analytics users guide, see "sessionize" function
http://www.info.teradata.com/edownload.cfm?itemid=143450001
http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
https://www.youtube.com/watch?v=C760M9ttK9Q


> Sessionization - Phase 3 (minimum time)
> ---------------------------------------
>
>                 Key: MADLIB-1002
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1002
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Minor
>              Labels: gsoc2016, starter
>             Fix For: v1.9.2
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set, 
> so that I can prepare for input into other algorithms like path functions, or 
> predictive analytics algorithms.
> This is a follow on to 
> https://issues.apache.org/jira/browse/MADLIB-909
> https://issues.apache.org/jira/browse/MADLIB-1001
> to add minimum time.
> Details 
> Add min time to the existing params:
> Proposed interface changes:
> {code}
> sessionize (
>    source_table,
>    output_table,
>    partition_expr,
>    order_expr,
>    time_stamp,
>    time_out,
>    min_time   -- new
>    output_all_cols,
>    create_view 
>    )
> {code}
> where
> min_time (optional)
> Minimum delta time that must elapse for an event to be considered a valid 
> event (default=0).  If an event happens in less than min_time since the last 
> valid event, it does not get included in the current session and is dropped.  
>  Same units as time_stamp.
> Implementation notes
> 1) Should be specified in the same units as the time_out parameter.   
> 2) Always compare against the last valid session event, not against one(s) 
> that just got dropped.
> For an example of how min_time could work, see Aster Analytics sessionization 
> function [1].
> References
> [1] Aster Analytics users guide, see "sessionize" function
> http://www.info.teradata.com/edownload.cfm?itemid=143450001
> http://www.info.teradata.com/templates/eSrchResults.cfm?txtpid=&txtrelno=&prodline=all&frmdt=&txtsrchstring=aster%20analytics&srtord=Desc&todt=&rdSort=Date
> https://www.youtube.com/watch?v=C760M9ttK9Q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to