Joseph K. Bradley created SPARK-14604:
-----------------------------------------
Summary: Modify design of ML model summaries
Key: SPARK-14604
URL: https://issues.apache.org/jira/browse/SPARK-14604
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: Joseph K. Bradley
Several spark.ml models now have summaries containing evaluation metrics and
training info:
* LinearRegressionModel
* LogisticRegressionModel
* GeneralizedLinearRegressionModel
These summaries have unfortunately been added in an inconsistent way. I
propose to reorganize them to have:
* For each model, 1 summary (without training info) and 1 training summary
(with info from training). The non-training summary can be produced for a new
dataset via {{evaluate}}.
* A summary should not store the model itself.
* A summary should provide a transient reference to the dataset used to produce
the summary.
This task will involve reorganizing the GLM summary (which lacks a
training/non-training distinction) and deprecating the model method in the
LinearRegressionSummary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]