[
https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435449#comment-15435449
]
Joseph K. Bradley commented on SPARK-17163:
-------------------------------------------
My 2 cents:
*Intercept/coefficients*: I don't think the intercept/coefficients issue is a
big deal. Providing extra methods which reshape the data into
Double/Vector/Matrix as needed seems easy and understandable.
*Pivoting*: This may be the biggest question in my mind. The different
behavior between LOR and MLOR may be confusing to users, and mushing these
together may not be a good idea.
*Thresholds and Summaries*: We designed threshold/thresholds and model
summaries for LOR with the expectation that MLOR would not be a separate
Estimator. If we do keep them separate, then we will need to deprecate
"thresholds" in LOR and probably modify the model summary class structure to
separate binary/multiclass.
*Realistically*: If we keep them separate, I expect that MLOR would become the
standard API and LOR would eventually fall out of use. There just would not be
much point in using LOR (once MLOR achieves API parity with LOR in terms of
model summaries, Python API, etc.). I also think we're setting ourselves up
for a headache if we maintain 2 code paths for the same functionality,
especially for prediction, summaries, and model saving/loading.
*Proposal*: I'd propose one of 2 options:
1. Merge LOR and MLOR now into LOR.
2. Keep them separate, with the plan to make MLOR the primary API and
eventually remove LOR (in Spark 3.0).
I'd vote for Option 1 if we can convince ourselves that users will not be
confused by a merged API (or if we take the time to do pivoting for MLOR).
P.S.: Sorry for the long absence. I'm finally re-emerging from a project...
> Decide on unified multinomial and binary logistic regression interfaces
> -----------------------------------------------------------------------
>
> Key: SPARK-17163
> URL: https://issues.apache.org/jira/browse/SPARK-17163
> Project: Spark
> Issue Type: Sub-task
> Components: ML, MLlib
> Reporter: Seth Hendrickson
>
> Before the 2.1 release, we should finalize the API for logistic regression.
> After SPARK-7159, we have both LogisticRegression and
> MultinomialLogisticRegression models. This may be confusing to users and, is
> a bit superfluous since MLOR can do basically all of what BLOR does. We
> should decide if it needs to be changed and implement those changes before 2.1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]