[ 
https://issues.apache.org/jira/browse/SPARK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036704#comment-15036704
 ] 

Xiangrui Meng commented on SPARK-8517:
--------------------------------------

* We should only mention MLlib specific types, like vectors and matrices. 
However, UDTs are not public and this doesn't seem to be a must to me.
* I think we can separate model selection from the basic concepts. But 
estimator/transformer/pipeline should get introduced together and the simple 
text classification pipeline is not very complicated to read.
* We didn't put a link because it is tricky to decide which branch/tag to use. 
The release process validates links on the user guide. So we dropped the link. 
See SPARK-11336 and its PR.
* As a workaround, I usually add a field called "id" to avoid "Tuple1.apply":

{code}
val data = Seq((0, -0.5), (1, -0.3), (2, 0.0), (3, 0.2))
val df = sqlContext.createDataFrame(data).toDF("id", "features")
{code}

> Improve the organization and style of MLlib's user guide
> --------------------------------------------------------
>
>                 Key: SPARK-8517
>                 URL: https://issues.apache.org/jira/browse/SPARK-8517
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Timothy Hunter
>
> The current MLlib's user guide (and spark.ml's), especially the main page, 
> doesn't have a nice style. We could update it and re-organize the content to 
> make it easier to navigate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to