[
https://issues.apache.org/jira/browse/SPARK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036704#comment-15036704
]
Xiangrui Meng commented on SPARK-8517:
--------------------------------------
* We should only mention MLlib specific types, like vectors and matrices.
However, UDTs are not public and this doesn't seem to be a must to me.
* I think we can separate model selection from the basic concepts. But
estimator/transformer/pipeline should get introduced together and the simple
text classification pipeline is not very complicated to read.
* We didn't put a link because it is tricky to decide which branch/tag to use.
The release process validates links on the user guide. So we dropped the link.
See SPARK-11336 and its PR.
* As a workaround, I usually add a field called "id" to avoid "Tuple1.apply":
{code}
val data = Seq((0, -0.5), (1, -0.3), (2, 0.0), (3, 0.2))
val df = sqlContext.createDataFrame(data).toDF("id", "features")
{code}
> Improve the organization and style of MLlib's user guide
> --------------------------------------------------------
>
> Key: SPARK-8517
> URL: https://issues.apache.org/jira/browse/SPARK-8517
> Project: Spark
> Issue Type: Improvement
> Components: Documentation, ML, MLlib
> Reporter: Xiangrui Meng
> Assignee: Timothy Hunter
>
> The current MLlib's user guide (and spark.ml's), especially the main page,
> doesn't have a nice style. We could update it and re-organize the content to
> make it easier to navigate.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]