[ 
https://issues.apache.org/jira/browse/SPARK-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6705:
-----------------------------------

    Assignee:     (was: Apache Spark)

> MLLIB ML Pipeline's Logistic Regression has no intercept term
> -------------------------------------------------------------
>
>                 Key: SPARK-6705
>                 URL: https://issues.apache.org/jira/browse/SPARK-6705
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Omede Firouz
>
> Currently, the ML Pipeline's LogisticRegression.scala file does not allow 
> setting whether or not to fit an intercept term. Therefore, the pipeline 
> defers to LogisticRegressionWithLBFGS which does not use an intercept term. 
> This makes sense from a performance point of view because adding an intercept 
> term requires memory allocation.
> However, this is undesirable statistically, since the statistical default is 
> usually to include an intercept term, and one needs to have a very strong
> reason for not having an intercept term.
> Explicitly modeling the intercept by adding a column of all 1s does not
> work because LogisticRegressionWithLBFGS forces column normalization, and a 
> column of all 1s has 0 variance and so dividing by 0 kills it.
> We should open up the API for the ML Pipeline to explicitly allow controlling 
> whether or not to fit an intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to