Joseph K. Bradley created SPARK-21729:
-----------------------------------------
Summary: Generic test for ProbabilisticClassifier to ensure
consistent output columns
Key: SPARK-21729
URL: https://issues.apache.org/jira/browse/SPARK-21729
Project: Spark
Issue Type: Test
Components: ML
Affects Versions: 2.2.0
Reporter: Joseph K. Bradley
One challenge with the ProbabilisticClassifier abstraction is that it
introduces different code paths for predictions depending on which output
columns are turned on or off: probability, rawPrediction, prediction. We ran
into a bug in MLOR with this.
This task is for adding a generic test usable in all test suites for
ProbabilisticClassifier types which does the following:
* Take a dataset + Estimator
* Fit the Estimator
* Test prediction using the model with all combinations of output columns
turned on/off.
* Make sure the output column values match, presumably by comparing vs. the
case with all 3 output columns turned on
CC [~WeichenXu123] since this came up in
https://github.com/apache/spark/pull/17373
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]