Hi Yu, Thanks for bringing it up for clarification. Here's a rough draft of a section for the soon-to-be-updated programming guide, which will have more info on the spark.ml package. Joseph
## spark.mllib vs. spark.ml Spark 1.2 will include a new machine learning package called spark.ml, currently an alpha component but potentially a successor to spark.mllib. The spark.ml package aims to replace the old APIs with a cleaner, more uniform set of APIs which will help users create full machine learning pipelines. (More info about pipelines will be included in the updated programming guide for Spark 1.2.) ### Development plan With Spark 1.2, spark.mllib is still the primary machine learning package, and spark.ml is an alpha component for testing the new API. The primary parts of this API are: * the Pipeline concept for constructing complicated ML workflows consisting of Estimators and Transformers, * SchemaRDD as an ML dataset, * and constructs for specifying parameters for algorithms and pipelines. If all goes well, spark.ml will become the primary ML package at the time of the Spark 1.3 release. Initially, simple wrappers will be used to port algorithms to spark.ml, but eventually, code will be moved to spark.ml and spark.mllib will be deprecated. ### Advice to developers During the next development cycle, new algorithms should be contributed to spark.mllib. Optionally, wrappers for new (and old) algorithms can be contributed to spark.ml. Users will be able to use algorithms from either of the two packages; the only difficulty will be the differences in APIs between the two packages. On Thu, Nov 27, 2014 at 6:41 AM, Yu Ishikawa <yuu.ishikawa+sp...@gmail.com> wrote: > Hi all, > > Spark ML alpha version exists in the current master branch on Github. > If we want to add new machine learning algorithms or to modify algorithms > which already exists, > which package should we implement them at org.apache.spark.mllib or > org.apache.spark.ml? > > thanks, > Yu > > > > ----- > -- Yu Ishikawa > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Which-is-the-correct-package-to-add-a-new-algorithm-tp9540.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >