[
https://issues.apache.org/jira/browse/SPARK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027375#comment-15027375
]
Timothy Hunter commented on SPARK-8517:
---------------------------------------
Here is a few comments I have at a high level:
- branding confusion about spark.mllib vs spark.ml vs the union of the two. It
is a bit hard right now when you navigate to the first page to see the
difference
- the focus of spark.ml is on pipelines. It should be on dataframes. It makes
it clear to separate it from spark.mllib which is on RDDs
- make pipelines a sub-concept of the spark.ml (instead of saying that
spark.ml is pipeline). Say that you can build pipelines with spark.ml
- make sure that all algorithms in spark.ml have the same level of usability
as in mllib. You should not be force to make a pipeline to use a single
algorithm
- Reorganize the spark.ml menu about the goal and not about the content. Users
want to solve issues (clustering, regression, classification), we organize by
theoretical concepts (decision trees, ensembles, linear methods). We should do
as mllib and sk-learn:
{code}
- MLlib: machine learning on RDDs
...
- SparkML: machine learning with (Spark) Dataframes
- General concepts and overview
- Building and transforming features
- Classification and Regression
- Clustering
- Collaborative filtering
- Chaining transforms with pipelines
- Advanced: Evaluation, import/export, developer APIs
- Examples
{code}
Some pieces are missing with this such as Dimensionality reduction. Also, the
scikit-learn guide has a more academic focus by splitting roughly at supervised
vs unsupervised.
I am going to drill down more into the sections for some suggestions.
> Improve the organization and style of MLlib's user guide
> --------------------------------------------------------
>
> Key: SPARK-8517
> URL: https://issues.apache.org/jira/browse/SPARK-8517
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, ML, MLlib
> Reporter: Xiangrui Meng
> Assignee: Timothy Hunter
>
> The current MLlib's user guide (and spark.ml's), especially the main page,
> doesn't have a nice style. We could update it and re-organize the content to
> make it easier to navigate.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]