[
https://issues.apache.org/jira/browse/MADLIB-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1001:
------------------------------------
Description:
Story
As a data scientist, I want to perform session reconstruction on my data set,
so that I can prepare for input into other algorithms like path functions, or
predictive analytics algorithms.
This is a follow on to
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.
Details
Proposed interface changes:
{code}
sessionize (
source_table,
output_table,
partition_expr,
time_stamp,
max_time,
output_cols -- new
create_view -- new
)
{code}
where
output_cols (optional)
TEXT.
asterisk (i.e., '*') -- ALL columns in input table + session column (default)
'x, y, z, ...' -- list of columns you want + session column. This list could
include the partition expression or other expressions as desired. This should
also support '*, expr1, expr2, etc.' where this means output all columns + the
extra expressions listed. Needs to a valid SELECT expression.
For example, in the path function
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html#examples
we do a similar thing for the aggregate function parameter.
create_view (optional)
BOOLEAN default: TRUE. Determines whether to create a view or
materialize a table as output. If you only needed session info once, creating a
view could be significantly faster than materializing as a table.
was:
Story
As a data scientist, I want to perform session reconstruction on my data set,
so that I can prepare for input into other algorithms like path functions, or
predictive analytics algorithms.
This is a follow on to
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.
Details
Proposed interface changes:
{code}
sessionize (
source_table,
output_table,
partition_expr,
time_stamp,
max_time,
output_cols -- new
create_view -- new
)
{code}
where
output_cols (optional)
TEXT.
asterisk (i.e., '*') -- ALL columns in input table + session column (default)
'x, y, z, ...' -- list of columns you want + session column. This list could
include the partition expression or other expressions as desired. This should
also support '*, expr1, expr2, etc.' where this means output all columns + the
extra expressions listed.
create_view (optional)
BOOLEAN default: TRUE. Determines whether to create a view or
materialize a table as output. If you only needed session info once, creating a
view could be significantly faster than materializing as a table.
> Sessionization - Phase 2 (output controls)
> ------------------------------------------
>
> Key: MADLIB-1001
> URL: https://issues.apache.org/jira/browse/MADLIB-1001
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Nandish Jayaram
> Priority: Minor
> Labels: gsoc2016, starter
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set,
> so that I can prepare for input into other algorithms like path functions, or
> predictive analytics algorithms.
> This is a follow on to
> https://issues.apache.org/jira/browse/MADLIB-909
> to add optional output controls.
> Details
> Proposed interface changes:
> {code}
> sessionize (
> source_table,
> output_table,
> partition_expr,
> time_stamp,
> max_time,
> output_cols -- new
> create_view -- new
> )
> {code}
> where
> output_cols (optional)
> TEXT.
> asterisk (i.e., '*') -- ALL columns in input table + session column (default)
> 'x, y, z, ...' -- list of columns you want + session column. This list could
> include the partition expression or other expressions as desired. This
> should also support '*, expr1, expr2, etc.' where this means output all
> columns + the extra expressions listed. Needs to a valid SELECT expression.
> For example, in the path function
> http://madlib.incubator.apache.org/docs/latest/group__grp__path.html#examples
> we do a similar thing for the aggregate function parameter.
> create_view (optional)
> BOOLEAN default: TRUE. Determines whether to create a view or
> materialize a table as output. If you only needed session info once, creating
> a view could be significantly faster than materializing as a table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)