Seth Hendrickson created SPARK-21405:
----------------------------------------
Summary: Add LBFGS solver for GeneralizedLinearRegression
Key: SPARK-21405
URL: https://issues.apache.org/jira/browse/SPARK-21405
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.3.0
Reporter: Seth Hendrickson
GeneralizedLinearRegression in Spark ML currently only allows 4096 features
because it uses IRLS, and hence WLS, as an optimizer which relies on collecting
the covariance matrix to the driver. GLMs can also be fit by simple gradient
based methods like LBFGS.
The new API from
[SPARK-19762|https://issues.apache.org/jira/browse/SPARK-19762] makes this easy
to add. I've already prototyped it, and it works pretty well. This change would
allow an arbitrary number of features (up to what can fit on a single node) as
in Linear/Logistic regression.
For reference, other GLM packages also support this - e.g. statsmodels, H2O.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]