[ https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dong Wang updated SPARK-1682: ----------------------------- Description: The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient. Add enhanced RDA L1 updater, which could produce even sparse solutions with comparable quality compared with L1. Reference: Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization", Journal of Machine Learning Research 11 (2010) 2543-2596. Small fix: add options to BinaryClassification example to read and write model file was: The LogisticRegressionWithSGD example does not expose the following capability that already exist inside MLlib: * reading svmlight data * regularization with l1 and l2 * add intercept * write model to a file * read model and generate predictions The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient. Summary: Add gradient descent w/o sampling and RDA L1 updater (was: LogisticRegressionWithSGD should support svmlight data and gradient descent w/o sampling) > Add gradient descent w/o sampling and RDA L1 updater > ---------------------------------------------------- > > Key: SPARK-1682 > URL: https://issues.apache.org/jira/browse/SPARK-1682 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.0.0 > Reporter: Dong Wang > Fix For: 1.0.0 > > > The GradientDescent optimizer does sampling before a gradient step. When > input data is already shuffled beforehand, it is possible to scan data and > make gradient descent for each data instance. This could be potentially more > efficient. > Add enhanced RDA L1 updater, which could produce even sparse solutions with > comparable quality compared with L1. Reference: > Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and > Online Optimization", Journal of Machine Learning Research 11 (2010) > 2543-2596. > Small fix: add options to BinaryClassification example to read and write > model file -- This message was sent by Atlassian JIRA (v6.2#6252)