[
https://issues.apache.org/jira/browse/SPARK-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-7674.
------------------------------
Resolution: Done
> R-like stats for ML models
> --------------------------
>
> Key: SPARK-7674
> URL: https://issues.apache.org/jira/browse/SPARK-7674
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Joseph K. Bradley
> Assignee: Joseph K. Bradley
> Priority: Critical
>
> This is an umbrella JIRA for supporting ML model summaries and statistics,
> following the example of R's summary() and plot() functions.
> [Design
> doc|https://docs.google.com/document/d/1oswC_Neqlqn5ElPwodlDY4IkSaHAi0Bx6Guo_LvhHK8/edit?usp=sharing]
> From the design doc:
> {quote}
> R and its well-established packages provide extensive functionality for
> inspecting a model and its results. This inspection is critical to
> interpreting, debugging and improving models.
> R is arguably a gold standard for a statistics/ML library, so this doc
> largely attempts to imitate it. The challenge we face is supporting similar
> functionality, but on big (distributed) data. Data size makes both efficient
> computation and meaningful displays/summaries difficult.
> R model and result summaries generally take 2 forms:
> * summary(model): Display text with information about the model and results
> on data
> * plot(model): Display plots about the model and results
> We aim to provide both of these types of information. Visualization for the
> plottable results will not be supported in MLlib itself, but we can provide
> results in a form which can be plotted easily with other tools.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]