[ 
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435449#comment-15435449
 ] 

Joseph K. Bradley commented on SPARK-17163:
-------------------------------------------

My 2 cents:

*Intercept/coefficients*: I don't think the intercept/coefficients issue is a 
big deal.  Providing extra methods which reshape the data into 
Double/Vector/Matrix as needed seems easy and understandable.

*Pivoting*: This may be the biggest question in my mind.  The different 
behavior between LOR and MLOR may be confusing to users, and mushing these 
together may not be a good idea.

*Thresholds and Summaries*: We designed threshold/thresholds and model 
summaries for LOR with the expectation that MLOR would not be a separate 
Estimator.  If we do keep them separate, then we will need to deprecate 
"thresholds" in LOR and probably modify the model summary class structure to 
separate binary/multiclass.

*Realistically*: If we keep them separate, I expect that MLOR would become the 
standard API and LOR would eventually fall out of use.  There just would not be 
much point in using LOR (once MLOR achieves API parity with LOR in terms of 
model summaries, Python API, etc.).  I also think we're setting ourselves up 
for a headache if we maintain 2 code paths for the same functionality, 
especially for prediction, summaries, and model saving/loading.

*Proposal*: I'd propose one of 2 options:
1. Merge LOR and MLOR now into LOR.
2. Keep them separate, with the plan to make MLOR the primary API and 
eventually remove LOR (in Spark 3.0).

I'd vote for Option 1 if we can convince ourselves that users will not be 
confused by a merged API (or if we take the time to do pivoting for MLOR).

P.S.: Sorry for the long absence.  I'm finally re-emerging from a project...

> Decide on unified multinomial and binary logistic regression interfaces
> -----------------------------------------------------------------------
>
>                 Key: SPARK-17163
>                 URL: https://issues.apache.org/jira/browse/SPARK-17163
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression. 
> After SPARK-7159, we have both LogisticRegression and 
> MultinomialLogisticRegression models. This may be confusing to users and, is 
> a bit superfluous since MLOR can do basically all of what BLOR does. We 
> should decide if it needs to be changed and implement those changes before 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to