[ 
https://issues.apache.org/jira/browse/MADLIB-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandish Jayaram reassigned MADLIB-1333:
---------------------------------------

    Assignee: Nandish Jayaram

> DL: Add new function for preprocessing images for validation dataset
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1333
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1333
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Nandish Jayaram
>            Assignee: Nandish Jayaram
>            Priority: Major
>
> Function to prepare the validation dataset for deep learning with madlib
>  * This function assumes that the pre processor for training data has already 
> been run.
>  * mini-batch x and y.
>  * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any 
> class levels (in the case that validation data set by itself does not have 
> all class values that are in the training dataset). This value will be read 
> from the output of the summary table for pre processor for training data.
>  * normalizing: use the same normalizing constant that was used while 
> creating batched training data, found in its summary table.
>  * rename x and y so that the column names for training data and validation 
> data are the same.
>  * applies to fit() and evaluate()
> Proposed Interface:
>  Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface 
> is the same as in master currently:
> {code:java}
> training_preprocessor_dl( source_table,  -- training dataset
>                           output_table,
>                           dependent_varname,
>                           independent_varname,
>                           buffer_size,                -- Optional
>                           normalizing_const,          -- Optional
>                           num_classes                         -- Optional
>                         )
> {code}
> New function for preparing validation data for evaluation:
> {code:java}
> validation_preprocessor_dl(
>       source_table,  -- validation dataset
>       output_table,  
>       dependent_varname,
>       independent_varname,
>       training_preprocessor_table,  -- i.e., from training_preprocessor_dl
>       buffer_size             -- Optional
> )
> {code}
> Note:
>  1. {{validation_preprocessor_dl}} does not need to randomize.
> Acceptance:
>  1. Input validation check to ensure `training_preprocessor_table` is not 
> null.
>  2. Run validation_preprocessor_dl and training_preprocessor_dl on some toy 
> data sets of 5-10 fake images of low res, e.g., 2x2. Manually check that both 
> sets are normalized the same and 1-hot encoded the same and all present in 
> the output tables (except ordering will be diff of course since training data 
> is randomized and val data is not).
>  3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure 
> fails with nice error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to