Joseph K. Bradley created SPARK-7674:
----------------------------------------
Summary: R-like stats for ML models
Key: SPARK-7674
URL: https://issues.apache.org/jira/browse/SPARK-7674
Project: Spark
Issue Type: New Feature
Components: ML
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Critical
This is an umbrella JIRA for supporting ML model summaries and statistics,
following the example of R's summary() and plot() functions.
[Design
doc|https://docs.google.com/document/d/1oswC_Neqlqn5ElPwodlDY4IkSaHAi0Bx6Guo_LvhHK8/edit?usp=sharing]
>From the design doc:
{quote}
R and its well-established packages provide extensive functionality for
inspecting a model and its results. This inspection is critical to
interpreting, debugging and improving models.
R is arguably a gold standard for a statistics/ML library, so this doc largely
attempts to imitate it. The challenge we face is supporting similar
functionality, but on big (distributed) data. Data size makes both efficient
computation and meaningful displays/summaries difficult.
R model and result summaries generally take 2 forms:
* summary(model): Display text with information about the model and results on
data
* plot(model): Display plots about the model and results
We aim to provide both of these types of information. Visualization for the
plottable results will not be supported in MLlib itself, but we can provide
results in a form which can be plotted easily with other tools.
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]