[
https://issues.apache.org/jira/browse/MADLIB-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nandish Jayaram reassigned MADLIB-1333:
---------------------------------------
Assignee: Nandish Jayaram
> DL: Add new function for preprocessing images for validation dataset
> --------------------------------------------------------------------
>
> Key: MADLIB-1333
> URL: https://issues.apache.org/jira/browse/MADLIB-1333
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Deep Learning
> Reporter: Nandish Jayaram
> Assignee: Nandish Jayaram
> Priority: Major
>
> Function to prepare the validation dataset for deep learning with madlib
> * This function assumes that the pre processor for training data has already
> been run.
> * mini-batch x and y.
> * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any
> class levels (in the case that validation data set by itself does not have
> all class values that are in the training dataset). This value will be read
> from the output of the summary table for pre processor for training data.
> * normalizing: use the same normalizing constant that was used while
> creating batched training data, found in its summary table.
> * rename x and y so that the column names for training data and validation
> data are the same.
> * applies to fit() and evaluate()
> Proposed Interface:
> Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface
> is the same as in master currently:
> {code:java}
> training_preprocessor_dl( source_table, -- training dataset
> output_table,
> dependent_varname,
> independent_varname,
> buffer_size, -- Optional
> normalizing_const, -- Optional
> num_classes -- Optional
> )
> {code}
> New function for preparing validation data for evaluation:
> {code:java}
> validation_preprocessor_dl(
> source_table, -- validation dataset
> output_table,
> dependent_varname,
> independent_varname,
> training_preprocessor_table, -- i.e., from training_preprocessor_dl
> buffer_size -- Optional
> )
> {code}
> Note:
> 1. {{validation_preprocessor_dl}} does not need to randomize.
> Acceptance:
> 1. Input validation check to ensure `training_preprocessor_table` is not
> null.
> 2. Run validation_preprocessor_dl and training_preprocessor_dl on some toy
> data sets of 5-10 fake images of low res, e.g., 2x2. Manually check that both
> sets are normalized the same and 1-hot encoded the same and all present in
> the output tables (except ordering will be diff of course since training data
> is randomized and val data is not).
> 3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure
> fails with nice error message.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)