[ 
https://issues.apache.org/jira/browse/SPARK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027375#comment-15027375
 ] 

Timothy Hunter commented on SPARK-8517:
---------------------------------------

Here is a few comments I have at a high level:
 - branding confusion about spark.mllib vs spark.ml vs the union of the two. It 
is a bit hard right now when you navigate to the first page to see the 
difference
 - the focus of spark.ml is on pipelines. It should be on dataframes. It makes 
it clear to separate it from spark.mllib which is on RDDs
 - make pipelines a sub-concept of the spark.ml (instead of saying that 
spark.ml is pipeline). Say that you can build pipelines with spark.ml
 - make sure that all algorithms in spark.ml have the same level of usability 
as in mllib. You should not be force to make a pipeline to use a single 
algorithm
 - Reorganize the spark.ml menu about the goal and not about the content. Users 
want to solve issues (clustering, regression, classification), we organize by 
theoretical concepts (decision trees, ensembles, linear methods). We should do 
as mllib and sk-learn:
{code}
- MLlib: machine learning on RDDs
...
- SparkML: machine learning with (Spark) Dataframes
  - General concepts and overview
  - Building and transforming features
  - Classification and Regression
  - Clustering
  - Collaborative filtering
  - Chaining transforms with pipelines
  - Advanced: Evaluation, import/export, developer APIs
  - Examples
{code}
Some pieces are missing with this such as Dimensionality reduction. Also, the 
scikit-learn guide has a more academic focus by splitting roughly at supervised 
vs unsupervised.
I am going to drill down more into the sections for some suggestions.

> Improve the organization and style of MLlib's user guide
> --------------------------------------------------------
>
>                 Key: SPARK-8517
>                 URL: https://issues.apache.org/jira/browse/SPARK-8517
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Timothy Hunter
>
> The current MLlib's user guide (and spark.ml's), especially the main page, 
> doesn't have a nice style. We could update it and re-organize the content to 
> make it easier to navigate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to