Nandish Jayaram created MADLIB-1333:
---------------------------------------

             Summary: DL: Add new function for preprocessing images for 
validation dataset
                 Key: MADLIB-1333
                 URL: https://issues.apache.org/jira/browse/MADLIB-1333
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Deep Learning
            Reporter: Nandish Jayaram


Function to prepare the validation dataset for deep learning with madlib
 * This function assumes that the pre processor for training data has already 
been run.
 * mini-batch x and y.
 * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any 
class levels (in the case that validation data set by itself does not have all 
class values that are in the training dataset). This value will be read from 
the output of the summary table for pre processor for training data.
 * normalizing: use the same normalizing constant that was used while creating 
batched training data, found in its summary table.
 * rename x and y so that the column names for training data and validation 
data are the same.
 * applies to fit() and evaluate()

Proposed Interface:
 Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface is 
the same as in master currently:
{code:java}
training_preprocessor_dl( source_table,  -- training dataset
                          output_table,
                          dependent_varname,
                          independent_varname,
                          buffer_size,                  -- Optional
                          normalizing_const,            -- Optional
                          num_classes                           -- Optional
                        )
{code}
New function for preparing validation data for evaluation:
{code:java}
validation_preprocessor_dl(
      source_table,  -- validation dataset
      output_table,  
      dependent_varname,
      independent_varname,
      training_preprocessor_table,  -- i.e., from training_preprocessor_dl
      buffer_size               -- Optional
)
{code}
Acceptance:
 1. Input validation check to ensure `training_preprocessor_table` is not null.
 2. Run `validation_preprocessor_dl` on the exact same data set as 
`training_preprocessor_dl` and ensure that respective output tables are the 
same element-by-element. This test may only be verifiable if there was exactly 
one image in the input table.
3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure fails 
with nice error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to