Dong Wang created SPARK-1682: -------------------------------- Summary: LogisticRegressionWithSGD should support svmlight data and gradient descent w/o sampling Key: SPARK-1682 URL: https://issues.apache.org/jira/browse/SPARK-1682 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.0.0 Reporter: Dong Wang Fix For: 1.0.0
The LogisticRegressionWithSGD example does not expose the following capability that already exist inside MLlib: * reading svmlight data * regularization with l1 and l2 * add intercept * write model to a file * read model and generate predictions The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient. -- This message was sent by Atlassian JIRA (v6.2#6252)