[
https://issues.apache.org/jira/browse/MADLIB-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-1001:
------------------------------------
Description:
Story
As a data scientist, I want to perform session reconstruction on my data set,
so that I can prepare for input into other algorithms like path functions, or
predictive analytics algorithms.
This is a follow on to
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.
Details
Proposed interface changes:
params (optional)
TEXT, default: NULL. Parameters for sessionization in a comma-delimited string
of key-value pairs. See the description below for details.
Parameters
Parameters in this section are supplied in the params argument as a string
containing a comma-delimited list of key-value pairs. All of these named
parameters are optional, and their order does not matter. You must use the
format <param_name> = <value> to specify the value of a parameter, otherwise
the parameter is ignored.
{code}
‘output_all_cols = <value>,
create_view = <value>'
{code}
Parameters
output_all_cols
BOOLEAN default: FALSE. Controls which columns are output. If FALSE,
only the partition, time stamp and the generated session ID columns are output.
(The assumption is that the partition columns together with the time stamp
column will be sufficient to perform a join with the input table.) If TRUE,
all columns from the source table are output in addition to the generated
session ID.
create_view
BOOLEAN default: TRUE. Determines whether to create a view or
materialize a table as output. If you only needed session info once, creating a
view could be significantly faster than materializing as a table.
was:
Story
As a data scientist, I want to perform session reconstruction on my data set,
so that I can prepare for input into other algorithms like path functions, or
predictive analytics algorithms.
This is a follow on to
https://issues.apache.org/jira/browse/MADLIB-909
to add optional output controls.
Details
Proposed interface changes:
params (optional)
TEXT, default: NULL. Parameters for sessionization in a comma-delimited string
of key-value pairs. See the description below for details.
Parameters
Parameters in this section are supplied in the params argument as a string
containing a comma-delimited list of key-value pairs. All of these named
parameters are optional, and their order does not matter. You must use the
format <param_name> = <value> to specify the value of a parameter, otherwise
the parameter is ignored.
{code}
‘output_all_cols = <value>,
create_view = <value>’
{code}
Parameters
output_all_cols (Boolean)
BOOLEAN default: FALSE. Controls which columns are output. If FALSE,
only the partition, time stamp and the generated session ID columns are output.
(The assumption is that the partition columns together with the time stamp
column will be sufficient to perform a join with the input table.) If TRUE,
all columns from the source table are output in addition to the generated
session ID.
create_view (Boolean)
BOOLEAN default: TRUE. Determines whether to create a view or
materialize a table as output. If you only needed session info once, creating a
view could be significantly faster than materializing as a table.
> Sessionization - Phase 2 (output controls)
> ------------------------------------------
>
> Key: MADLIB-1001
> URL: https://issues.apache.org/jira/browse/MADLIB-1001
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Nandish Jayaram
> Priority: Minor
> Labels: gsoc2016, starter
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set,
> so that I can prepare for input into other algorithms like path functions, or
> predictive analytics algorithms.
> This is a follow on to
> https://issues.apache.org/jira/browse/MADLIB-909
> to add optional output controls.
> Details
> Proposed interface changes:
> params (optional)
> TEXT, default: NULL. Parameters for sessionization in a comma-delimited
> string of key-value pairs. See the description below for details.
> Parameters
> Parameters in this section are supplied in the params argument as a string
> containing a comma-delimited list of key-value pairs. All of these named
> parameters are optional, and their order does not matter. You must use the
> format <param_name> = <value> to specify the value of a parameter, otherwise
> the parameter is ignored.
> {code}
> ‘output_all_cols = <value>,
> create_view = <value>'
> {code}
> Parameters
> output_all_cols
> BOOLEAN default: FALSE. Controls which columns are output. If FALSE,
> only the partition, time stamp and the generated session ID columns are
> output. (The assumption is that the partition columns together with the time
> stamp column will be sufficient to perform a join with the input table.) If
> TRUE, all columns from the source table are output in addition to the
> generated session ID.
> create_view
> BOOLEAN default: TRUE. Determines whether to create a view or
> materialize a table as output. If you only needed session info once, creating
> a view could be significantly faster than materializing as a table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)