[
https://issues.apache.org/jira/browse/MADLIB-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114823#comment-16114823
]
Cooper Sloan commented on MADLIB-413:
-------------------------------------
Yep, that should be fixed now.
A few notes about our approach.
The approach thus far has taken the user table as is, and relies on the user to
distribute the data well.
One additional approach we can take is to copy the user input data into a new
temp table. This incurs the
overhead of copying the entire table, which may be very large. However it
allows us to do a few more
things which could be very useful.
1: Partition the data randomly instead of using the user distribution policy.
2: Give each segment an overlapping subset of the data, as suggested in
https://www.microsoft.com/en-us/research/wp-content/uploads/2013/12/Accelerating-RNN-training.pdf
3: Allow different batches between each iteration, which seems to be implied by
the Microsoft paper, though
not completely clear from my reading.
I think this would be a good candidate for phase3.
> Neural Networks - MLP - Phase 1
> -------------------------------
>
> Key: MADLIB-413
> URL: https://issues.apache.org/jira/browse/MADLIB-413
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Neural Networks
> Reporter: Caleb Welton
> Assignee: Cooper Sloan
> Fix For: v1.12
>
> Attachments: screenshot-1.png
>
>
> Multilayer perceptron with backpropagation
> Modules:
> * mlp_classification
> * mlp_regression
> Interface
> {code}
> source_table VARCHAR
> output_table VARCHAR
> independent_varname VARCHAR -- Column name for input features, should be a
> Real Valued array
> dependent_varname VARCHAR, -- Column name for target values, should be Real
> Valued array of size 1 or greater
> hidden_layer_sizes INTEGER[], -- Number of units per hidden layer (can be
> empty or null, in which case, no hidden layers)
> optimizer_params VARCHAR, -- Specified below
> weights VARCHAR, -- Column name for weights. Weights the loss for each input
> vector. Column should contain positive real value
> activation_function VARCHAR, -- One of 'sigmoid' (default), 'tanh', 'relu',
> or any prefix (eg. 't', 's')
> grouping_cols
> )
> {code}
> where
> {code}
> optimizer_params: -- eg "step_size=0.5, n_tries=5"
> {
> step_size DOUBLE PRECISION, -- Learning rate
> n_iterations INTEGER, -- Number of iterations per try
> n_tries INTEGER, -- Total number of training cycles, with random
> initializations to avoid local minima.
> tolerance DOUBLE PRECISION, -- Maximum distance between weights before
> training stops (or until it reaches n_iterations)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)