[ 
https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590951#comment-15590951
 ] 

Vincent commented on SPARK-18023:
---------------------------------

I can start with ADAM, then maybe other Ada methods after that

> Adam optimizer
> --------------
>
>                 Key: SPARK-18023
>                 URL: https://issues.apache.org/jira/browse/SPARK-18023
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>            Reporter: Vincent
>            Priority: Minor
>
> It could be incredibly slow for SGD methods to diverge or converge if their  
> learning rate alpha are set inappropriately, many alternative methods have 
> been proposed to produce desirable convergence with less dependence on 
> hyperparameter settings, and to help prevent local optimum, e.g. Momentom, 
> NAG (Nesterov's Accelerated Gradient), Adagrad, RMSProp etc.
> Among which, Adam is one of the popular algorithms, which is for first-order 
> gradient-based optimization of stochastic objective functions. It's proved to 
> be well suited for problems with large data and/or parameters, and for 
> problems with noisy and/or sparse gradients and is computationally efficient. 
> Refer to this paper for details<https://arxiv.org/pdf/1412.6980v8.pdf>
> In fact, Tensorflow has implemented most of the adaptive optimization methods 
> mentioned, and we have seen that Adam out performs most of SGD methods in 
> certain cases, such as very sparse dataset in a FM model.
> It could be nice for Spark to have these adaptive optimization methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to