Omede Firouz created SPARK-6705:
-----------------------------------
Summary: MLLIB ML Pipeline's Logistic Regression has no intercept
term
Key: SPARK-6705
URL: https://issues.apache.org/jira/browse/SPARK-6705
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Reporter: Omede Firouz
Currently, the ML Pipeline's LogisticRegression.scala file does not allow
setting whether or not to fit an intercept term. Therefore, the pipeline defers
to LogisticRegressionWithLBFGS which does not use an intercept term. This makes
sense from a performance point of view because adding an intercept term
requires memory allocation.
However, this is undesirable statistically, since the statistical default is
usually to include an intercept term, and one needs to have a very strong
reason for not having an intercept term.
Explicitly modeling the intercept by adding a column of all 1s does not
work because LogisticRegressionWithLBFGS forces column normalization, and a
column of all 1s has 0 variance and so dividing by 0 kills it.
We should open up the API for the ML Pipeline to explicitly allow controlling
whether or not to fit an intercept.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]