Omede Firouz created SPARK-6705:
-----------------------------------

             Summary: MLLIB ML Pipeline's Logistic Regression has no intercept 
term
                 Key: SPARK-6705
                 URL: https://issues.apache.org/jira/browse/SPARK-6705
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Omede Firouz


Currently, the ML Pipeline's LogisticRegression.scala file does not allow 
setting whether or not to fit an intercept term. Therefore, the pipeline defers 
to LogisticRegressionWithLBFGS which does not use an intercept term. This makes 
sense from a performance point of view because adding an intercept term 
requires memory allocation.

However, this is undesirable statistically, since the statistical default is 
usually to include an intercept term, and one needs to have a very strong
reason for not having an intercept term.

Explicitly modeling the intercept by adding a column of all 1s does not
work because LogisticRegressionWithLBFGS forces column normalization, and a 
column of all 1s has 0 variance and so dividing by 0 kills it.

We should open up the API for the ML Pipeline to explicitly allow controlling 
whether or not to fit an intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to