[ 
https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Wang updated SPARK-1682:
-----------------------------

    Description: 
The GradientDescent optimizer does sampling before a gradient step. When input 
data is already shuffled beforehand, it is possible to scan data and make 
gradient descent for each data instance. This could be potentially more 
efficient.

Add enhanced RDA L1 updater, which could produce even sparse solutions with 
comparable quality compared with L1. Reference: 
Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and 
Online Optimization", Journal of Machine Learning Research 11 (2010) 2543-2596.

Small fix: add options to BinaryClassification example to read and write model 
file


  was:
The LogisticRegressionWithSGD example does not expose the following capability 
that already exist inside MLlib:
  * reading svmlight data
  * regularization with l1 and l2
  * add intercept
  * write model to a file
  * read model and generate predictions

The GradientDescent optimizer does sampling before a gradient step. When input 
data is already shuffled beforehand, it is possible to scan data and make 
gradient descent for each data instance. This could be potentially more 
efficient.

        Summary: Add gradient descent w/o sampling and RDA L1 updater  (was: 
LogisticRegressionWithSGD should support svmlight data and gradient descent w/o 
sampling)

> Add gradient descent w/o sampling and RDA L1 updater
> ----------------------------------------------------
>
>                 Key: SPARK-1682
>                 URL: https://issues.apache.org/jira/browse/SPARK-1682
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Dong Wang
>             Fix For: 1.0.0
>
>
> The GradientDescent optimizer does sampling before a gradient step. When 
> input data is already shuffled beforehand, it is possible to scan data and 
> make gradient descent for each data instance. This could be potentially more 
> efficient.
> Add enhanced RDA L1 updater, which could produce even sparse solutions with 
> comparable quality compared with L1. Reference: 
> Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and 
> Online Optimization", Journal of Machine Learning Research 11 (2010) 
> 2543-2596.
> Small fix: add options to BinaryClassification example to read and write 
> model file



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to