Vincent created SPARK-18023:
-------------------------------
Summary: Adam optimizer
Key: SPARK-18023
URL: https://issues.apache.org/jira/browse/SPARK-18023
Project: Spark
Issue Type: New Feature
Components: ML, MLlib
Reporter: Vincent
Priority: Minor
It could be incredibly slow for SGD methods to diverge or converge if their
learning rate alpha are set inappropriately, many alternative methods have been
proposed to produce desirable convergence with less dependence on
hyperparameter settings, and to help prevent local optimum, e.g. Momentom, NAG
(Nesterov's Accelerated Gradient), Adagrad, RMSProp etc.
Among which, Adam is one of the popular algorithms, which is for first-order
gradient-based optimization of stochastic objective functions. It's proved to
be well suited for problems with large data and/or parameters, and for problems
with noisy and/or sparse gradients and is computationally efficient. Refer to
this paper for details<https://arxiv.org/pdf/1412.6980v8.pdf>
In fact, Tensorflow has implemented most of the adaptive optimization methods
mentioned, and we have seen that Adam out performs most of SGD methods in
certain cases, such as very sparse dataset in a FM model.
It could be nice for Spark to have these adaptive optimization methods.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]