[ 
https://issues.apache.org/jira/browse/MADLIB-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380827#comment-16380827
 ] 

Nandish Jayaram commented on MADLIB-1206:
-----------------------------------------

Based on the output of preprocess step in 
https://issues.apache.org/jira/browse/MADLIB-1200, MLP should decide to use 
mini-batch or not, with some basic testing:

Check for <preprocessed_table_name>_summary, and 
<preprocessed_table_name>_standardization, and the column names in them to 
verify if the data is pre-processed or not. If preprocessed, then use 
mini-batch, else use regular IGD.

Other information we should get from pre-process step:
 # the mean and standard deviation for independent variable.
 # Figure out if the data is pre-processed for classification or regression by 
looking at a column named `classes` in  <preprocessed_table_name>_summary.
 # Get the original input table name, independent/dependent variable names, 
grouping columns from  <preprocessed_table_name>_summary.
 # Use buffer size from <preprocessed_table_name>_summary to validate the 
batch_size to be used in MLP mini-batch.

> Add mini batch based gradient descent support to MLP
> ----------------------------------------------------
>
>                 Key: MADLIB-1206
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1206
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Neural Networks
>            Reporter: Nandish Jayaram
>            Assignee: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.14
>
>
> Mini-batch gradient descent is typically the algorithm of choice when 
> training a neural network.
> MADlib currently supports IGD, we may have to add extensions to include 
> mini-batch as a solver for MLP. Other modules will continue to use the 
> existing IGD that does not support mini-batching. Later JIRAs will move other 
> modules over one at a time to use the new mini-batch GD.
> Related JIRA that will pre-process the input data to be consumed by 
> mini-batch is https://issues.apache.org/jira/browse/MADLIB-1200



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to