yuhao yang created SPARK-20602:
----------------------------------
Summary: Adding LBFGS as optimizer for LinearSVC
Key: SPARK-20602
URL: https://issues.apache.org/jira/browse/SPARK-20602
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.0
Reporter: yuhao yang
Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check
https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between
LBFGS and OWLQN on several public dataset and found LBFGS converges much faster
for LinearSVC in most cases.
The following table presents the number of training iterations and f1 score of
both optimizers until convergence
||Dataset||LBFGS||OWLQN||
|news20.binary| 31 (0.99) | 413(0.99) |
|mushroom| 28(1.0) | 170(1.0)|
|madelon|143(0.75) | 8129(0.70)|
|breast-cancer-scale| 15(1.0) | 16(1.0)|
|phishing | 329(0.94) | 231(0.94) |
|a1a(adult) | 466 (0.87) | 282 (0.87) |
|a7a | 237 (0.84) | 372(0.84) |
data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)
LBFGS requires less iterations in most cases (except for a1a) and probably is a
better default optimizer.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]