Frank McQuillan created MADLIB-1300:
---------------------------------------

             Summary: Clarify dep and indep var column names in output table 
for deep learning minibatch preprocessor
                 Key: MADLIB-1300
                 URL: https://issues.apache.org/jira/browse/MADLIB-1300
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Utilities
            Reporter: Frank McQuillan
             Fix For: v1.16


Follow on to this commit:
Minibatch Preprocessor for Deep learning
https://github.com/apache/madlib/commit/8de32ede33c48d2f4a440f0f639c94a277a359c1

The output table produced by the deep mini-batch preprocessor contains the 
following columns:

{code}
...
dependent_varname       FLOAT8[]. Packed array of dependent variables. If the 
dependent variable in the source table is categorical, the preprocessor will 
one-hot encode it.
independent_varname     FLOAT8[]. Packed array of independent variables.
...
{code}

This is misleading because these columns contain values not names, so we should 
rename these columns to:

{code}
...
dependent_var
independent_var
...
{code}

The output summary table contains the following columns: 

{code}
dependent_varname       Dependent variable from the source table.
independent_varname     Independent variable from the source table.
{code}

This is OK since the columns actually do contain names.

There is a related 2.0 story for the regular mini-batch preprocessor 
http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html
in JIRA https://issues.apache.org/jira/browse/MADLIB-1294





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to