Gábor Hermann commented on FLINK-1807:

Thanks for your reply!

Then, if I understand you correctly, this solution would not be proper because 
of the excessive use of memory.

I believe sampling with dynamic path could have another significant overhead. 
If we used that approach, we would have to load a sample/minibatch of the data 
from another resource (disk/network) at every iteration step, and that might 
have a negative effect on performance. Of course if the sampling would not read 
the whole data from disk, but only the needed sample, then it would have a 
plausible performance. In another case, if sampling must read the whole data at 
every iteration, it could be arguably slow. (There's a third case, when we keep 
the data in memory, and the sampling does not have to do IO, but then we have 
similar memory usage as with my workaround.)

As I see it, the two solutions (my suggested workaround and sampling with 
dynamic path) represent two sides of a memory-performance tradeoff: mine using 
too much memory, the other (possibly) being slow. Do I see it right? Do you 
think it's worth to choose the sampling approach here, because the performance 
overhead would be much lower? Or my workaround would be too "hacky", and it 
should not be burnt into the algorithm whether the sampling happens from memory 
or from disk?

> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>                 Key: FLINK-1807
>                 URL: https://issues.apache.org/jira/browse/FLINK-1807
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Theodore Vasiloudis
>              Labels: ML
> Stochastic gradient descent (SGD) is a widely used optimization technique in 
> different ML algorithms. Thus, it would be helpful to provide a generalized 
> SGD implementation which can be instantiated with the respective gradient 
> computation. Such a building block would make the development of future 
> algorithms easier.

This message was sent by Atlassian JIRA

Reply via email to