Github user njayaram2 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/49#discussion_r67953497
--- Diff: src/ports/postgres/modules/utilities/sessionize.py_in ---
@@ -35,41 +36,83 @@ def sessionize(schema_madlib, source_table,
output_table, partition_expr,
@param source_table: str, Name of the input table/view
@param output_table: str, Name of the table to store result
@param partition_expr: str, Expression to partition (group) the
input data
- @param time_stamp: str, Column name with time used for
sessionization calculation
+ @param time_stamp: str, The time stamp column name that is used
for sessionization calculation
@param max_time: interval, Delta time between subsequent events to
define a session
-
+ @param output_cols: str, list of columns the output table/view
must contain (default '*'):
+ * - all columns in the input table, and a new
session ID column
+ 'a,b,c,...' - a comma separated list of column
names/expressions to be projected, along with a new session ID column
--- End diff --
I believe we had precedent in other MADlib functions, and hence chose to go
with the comma separated string to specify output_cols.
One potential modification could be to ask the user to specify a valid
SELECT expression as output_cols, with the constraint that expressions must be
renamed using AS. For instance, output_cols should be something like the
following:
'*, "user id"<100 AS uid_100, revenue>20 AS rev_20',
instead of its current form which is:
'*, "user id"<100, revenue>20'
We will have to decide if this is too hard a constraint to have or not.
Having this constraint will take away all the messy string parsing stuff we
currently have implemented though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---