[
https://issues.apache.org/jira/browse/SPARK-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-6705:
-----------------------------------
Assignee: (was: Apache Spark)
> MLLIB ML Pipeline's Logistic Regression has no intercept term
> -------------------------------------------------------------
>
> Key: SPARK-6705
> URL: https://issues.apache.org/jira/browse/SPARK-6705
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Reporter: Omede Firouz
>
> Currently, the ML Pipeline's LogisticRegression.scala file does not allow
> setting whether or not to fit an intercept term. Therefore, the pipeline
> defers to LogisticRegressionWithLBFGS which does not use an intercept term.
> This makes sense from a performance point of view because adding an intercept
> term requires memory allocation.
> However, this is undesirable statistically, since the statistical default is
> usually to include an intercept term, and one needs to have a very strong
> reason for not having an intercept term.
> Explicitly modeling the intercept by adding a column of all 1s does not
> work because LogisticRegressionWithLBFGS forces column normalization, and a
> column of all 1s has 0 variance and so dividing by 0 kills it.
> We should open up the API for the ML Pipeline to explicitly allow controlling
> whether or not to fit an intercept.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]