Joseph K. Bradley created SPARK-7674:
----------------------------------------

             Summary: R-like stats for ML models
                 Key: SPARK-7674
                 URL: https://issues.apache.org/jira/browse/SPARK-7674
             Project: Spark
          Issue Type: New Feature
          Components: ML
            Reporter: Joseph K. Bradley
            Assignee: Joseph K. Bradley
            Priority: Critical


This is an umbrella JIRA for supporting ML model summaries and statistics, 
following the example of R's summary() and plot() functions.

[Design 
doc|https://docs.google.com/document/d/1oswC_Neqlqn5ElPwodlDY4IkSaHAi0Bx6Guo_LvhHK8/edit?usp=sharing]

>From the design doc:
{quote}
R and its well-established packages provide extensive functionality for 
inspecting a model and its results.  This inspection is critical to 
interpreting, debugging and improving models.

R is arguably a gold standard for a statistics/ML library, so this doc largely 
attempts to imitate it.  The challenge we face is supporting similar 
functionality, but on big (distributed) data.  Data size makes both efficient 
computation and meaningful displays/summaries difficult.

R model and result summaries generally take 2 forms:
* summary(model): Display text with information about the model and results on 
data
* plot(model): Display plots about the model and results

We aim to provide both of these types of information.  Visualization for the 
plottable results will not be supported in MLlib itself, but we can provide 
results in a form which can be plotted easily with other tools.
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to