[ https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585716#comment-15585716 ]
Gábor Hermann commented on FLINK-1807: -------------------------------------- Thanks for your reply! Then, if I understand you correctly, this solution would not be proper because of the excessive use of memory. I believe sampling with dynamic path could have another significant overhead. If we used that approach, we would have to load a sample/minibatch of the data from another resource (disk/network) at every iteration step, and that might have a negative effect on performance. Of course if the sampling would not read the whole data from disk, but only the needed sample, then it would have a plausible performance. In another case, if sampling must read the whole data at every iteration, it could be arguably slow. (There's a third case, when we keep the data in memory, and the sampling does not have to do IO, but then we have similar memory usage as with my workaround.) As I see it, the two solutions (my suggested workaround and sampling with dynamic path) represent two sides of a memory-performance tradeoff: mine using too much memory, the other (possibly) being slow. Do I see it right? Do you think it's worth to choose the sampling approach here, because the performance overhead would be much lower? Or my workaround would be too "hacky", and it should not be burnt into the algorithm whether the sampling happens from memory or from disk? > Stochastic gradient descent optimizer for ML library > ---------------------------------------------------- > > Key: FLINK-1807 > URL: https://issues.apache.org/jira/browse/FLINK-1807 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Theodore Vasiloudis > Labels: ML > > Stochastic gradient descent (SGD) is a widely used optimization technique in > different ML algorithms. Thus, it would be helpful to provide a generalized > SGD implementation which can be instantiated with the respective gradient > computation. Such a building block would make the development of future > algorithms easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)