[
https://issues.apache.org/jira/browse/MADLIB-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308921#comment-15308921
]
Jim Nasby commented on MADLIB-1001:
-----------------------------------
Instead of `output_all_cols boolean`, a list/array of columns to actually
output would be more flexible. If the list/array was empty, that would mean
output no columns. If it was None/NULL, it would mean output all columns. The
partition, time and session columns would always be output.
Also, it's perhaps worth stating in the documentation that both of these
options are provided for performance reasons. There are situations where you
don't need all the data, or you'll only be using the data once so there's no
reason to materialize it.
> Sessionization - Phase 2 (output controls)
> ------------------------------------------
>
> Key: MADLIB-1001
> URL: https://issues.apache.org/jira/browse/MADLIB-1001
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Nandish Jayaram
> Priority: Minor
> Labels: gsoc2016, starter
> Fix For: v1.9.1
>
>
> Story
> As a data scientist, I want to perform session reconstruction on my data set,
> so that I can prepare for input into other algorithms like path functions, or
> predictive analytics algorithms.
> This is a follow on to
> https://issues.apache.org/jira/browse/MADLIB-909
> to add optional output controls.
> Details
> Proposed interface changes:
> {code}
> sessionize (
> source_table,
> output_table,
> partition_expr,
> order_expr,
> time_stamp,
> time_out,
> output_all_cols, -- new
> create_view -- new
> )
> {code}
> where
> output_all_cols
> BOOLEAN default: FALSE. Controls which columns are output. If FALSE,
> only the partition, time stamp and the generated session ID columns are
> output. (The assumption is that the partition columns together with the time
> stamp column will be sufficient to perform a join with the input table.) If
> TRUE, all columns from the source table are output in addition to the
> generated session ID.
> create_view
> BOOLEAN default: TRUE. Determines whether to create a view or
> materialize a table as output. If you only needed session info once, creating
> a view could be significantly faster than materializing as a table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)