Repository: spark Updated Branches: refs/heads/master cf33a8628 -> d2493a203
[SPARK-18812][MLLIB] explain "Spark ML" ## What changes were proposed in this pull request? There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion. I check the [Spark FAQ page](http://spark.apache.org/faq.html), which seems too high-level for the content here. So I added it to the MLlib user guide instead. cc: mateiz Author: Xiangrui Meng <[email protected]> Closes #16241 from mengxr/SPARK-18812. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d2493a20 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d2493a20 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d2493a20 Branch: refs/heads/master Commit: d2493a203e852adf63dde4e1fc993e8d11efec3d Parents: cf33a86 Author: Xiangrui Meng <[email protected]> Authored: Fri Dec 9 17:34:52 2016 -0800 Committer: Xiangrui Meng <[email protected]> Committed: Fri Dec 9 17:34:52 2016 -0800 ---------------------------------------------------------------------- docs/ml-guide.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/d2493a20/docs/ml-guide.md ---------------------------------------------------------------------- diff --git a/docs/ml-guide.md b/docs/ml-guide.md index ddf81be..9717619 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -35,6 +35,18 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin * The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across multiple languages. * DataFrames facilitate practical ML Pipelines, particularly feature transformations. See the [Pipelines guide](ml-pipeline.html) for details. +*What is "Spark ML"?* + +* "Spark ML" is not an official name but occasionally used to refer to the MLlib DataFrame-based API. + This is majorly due to the `org.apache.spark.ml` Scala package name used by the DataFrame-based API, + and the "Spark ML Pipelines" term we used initially to emphasize the pipeline concept. + +*Is MLlib deprecated?* + +* No. MLlib includes both the RDD-based API and the DataFrame-based API. + The RDD-based API is now in maintenance mode. + But neither API is deprecated, nor MLlib as a whole. + # Dependencies MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
