[
https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-18023.
----------------------------------
Resolution: Incomplete
> Adam optimizer
> --------------
>
> Key: SPARK-18023
> URL: https://issues.apache.org/jira/browse/SPARK-18023
> Project: Spark
> Issue Type: New Feature
> Components: ML, MLlib
> Reporter: Vincent
> Priority: Minor
> Labels: bulk-closed
>
> It could be incredibly slow for SGD methods to diverge or converge if their
> learning rate alpha are set inappropriately, many alternative methods have
> been proposed to produce desirable convergence with less dependence on
> hyperparameter settings, and to help prevent local optimum, e.g. Momentom,
> NAG (Nesterov's Accelerated Gradient), Adagrad, RMSProp etc.
> Among which, Adam is one of the popular algorithms, which is for first-order
> gradient-based optimization of stochastic objective functions. It's proved to
> be well suited for problems with large data and/or parameters, and for
> problems with noisy and/or sparse gradients and is computationally efficient.
> Refer to this paper for details<https://arxiv.org/pdf/1412.6980v8.pdf>
> In fact, Tensorflow has implemented most of the adaptive optimization methods
> mentioned, and we have seen that Adam out performs most of SGD methods in
> certain cases, such as very sparse dataset in a FM model.
> It could be nice for Spark to have these adaptive optimization methods.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]