[
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585097#comment-15585097
]
Gábor Hermann commented on FLINK-1807:
--------------------------------------
Hi all,
I have a workaround in mind for a "real" SGD. The main idea is to use a
minibatch approach instead of random sampling.
We would split the data to minibatches randomly, then collect every partition
into a single object containing all the data corresponding that partition.
I.e. we would have something like a {{DataSet[Array[(MiniBatchId,
Array[Array[Double]])]}},
where every element of this DataSet (i.e. every array) would contain the data
for one partition, and every element of the array would correspond to one
partition of a minibatch. (Actually an {{Array[Array[Array[Double]]]}} is
sufficient to represent a partition.) Then we would have a static DataSet that
represents the data at iteration, so we would avoid the problem of using a
dynamic DataSet inside an iteration.
At every iteration we would broadcast the vector model, choose a minibatch
(e.g. iteration number modulo number of minibatches), and calculate the
gradient at every partition based on that minibatch. Then we would aggregate
these gradients and update the vector model.
The main drawback of this approach is that we would have to keep all the data
in memory. If that's tolerable we could make this improvement. What do you
think? Do you see any other disadvantages?
> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
> Key: FLINK-1807
> URL: https://issues.apache.org/jira/browse/FLINK-1807
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Theodore Vasiloudis
> Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in
> different ML algorithms. Thus, it would be helpful to provide a generalized
> SGD implementation which can be instantiated with the respective gradient
> computation. Such a building block would make the development of future
> algorithms easier.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)