[
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585716#comment-15585716
]
Gábor Hermann commented on FLINK-1807:
--------------------------------------
Thanks for your reply!
Then, if I understand you correctly, this solution would not be proper because
of the excessive use of memory.
I believe sampling with dynamic path could have another significant overhead.
If we used that approach, we would have to load a sample/minibatch of the data
from another resource (disk/network) at every iteration step, and that might
have a negative effect on performance. Of course if the sampling would not read
the whole data from disk, but only the needed sample, then it would have a
plausible performance. In another case, if sampling must read the whole data at
every iteration, it could be arguably slow. (There's a third case, when we keep
the data in memory, and the sampling does not have to do IO, but then we have
similar memory usage as with my workaround.)
As I see it, the two solutions (my suggested workaround and sampling with
dynamic path) represent two sides of a memory-performance tradeoff: mine using
too much memory, the other (possibly) being slow. Do I see it right? Do you
think it's worth to choose the sampling approach here, because the performance
overhead would be much lower? Or my workaround would be too "hacky", and it
should not be burnt into the algorithm whether the sampling happens from memory
or from disk?
> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
> Key: FLINK-1807
> URL: https://issues.apache.org/jira/browse/FLINK-1807
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Till Rohrmann
> Assignee: Theodore Vasiloudis
> Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in
> different ML algorithms. Thus, it would be helpful to provide a generalized
> SGD implementation which can be instantiated with the respective gradient
> computation. Such a building block would make the development of future
> algorithms easier.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)