Joseph K. Bradley created SPARK-14604:
-----------------------------------------

             Summary: Modify design of ML model summaries
                 Key: SPARK-14604
                 URL: https://issues.apache.org/jira/browse/SPARK-14604
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Joseph K. Bradley


Several spark.ml models now have summaries containing evaluation metrics and 
training info:
* LinearRegressionModel
* LogisticRegressionModel
* GeneralizedLinearRegressionModel

These summaries have unfortunately been added in an inconsistent way.  I 
propose to reorganize them to have:
* For each model, 1 summary (without training info) and 1 training summary 
(with info from training).  The non-training summary can be produced for a new 
dataset via {{evaluate}}.
* A summary should not store the model itself.
* A summary should provide a transient reference to the dataset used to produce 
the summary.

This task will involve reorganizing the GLM summary (which lacks a 
training/non-training distinction) and deprecating the model method in the 
LinearRegressionSummary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to