[ https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520635#comment-16520635 ]
Joseph K. Bradley commented on SPARK-4591: ------------------------------------------ There are still a few contained tasks which are incomplete. I'd like to leave this open for now. > Algorithm/model parity for spark.ml (Scala) > ------------------------------------------- > > Key: SPARK-4591 > URL: https://issues.apache.org/jira/browse/SPARK-4591 > Project: Spark > Issue Type: Umbrella > Components: ML > Reporter: Xiangrui Meng > Priority: Critical > > This is an umbrella JIRA for porting spark.mllib implementations to use the > DataFrame-based API defined under spark.ml. We want to achieve critical > feature parity for the next release. > h3. Instructions for 3 subtask types > *Review tasks*: detailed review of a subpackage to identify feature gaps > between spark.mllib and spark.ml. > * Should be listed as a subtask of this umbrella. > * Review subtasks cover major algorithm groups. To pick up a review subtask, > please: > ** Comment that you are working on it. > ** Compare the public APIs of spark.ml vs. spark.mllib. > ** Comment on all missing items within spark.ml: algorithms, models, methods, > features, etc. > ** Check for existing JIRAs covering those items. If there is no existing > JIRA, create one, and link it to your comment. > *Critical tasks*: higher priority missing features which are required for > this umbrella JIRA. > * Should be linked as "requires" links. > *Other tasks*: lower priority missing features which can be completed after > the critical tasks. > * Should be linked as "contains" links. > h4. Excluded items > This does *not* include: > * Python: We can compare Scala vs. Python in spark.ml itself. > * Moving linalg to spark.ml: [SPARK-13944] > * Streaming ML: Requires stabilizing some internal APIs of structured > streaming first > h3. TODO list > *Critical issues* > * [SPARK-14501]: Frequent Pattern Mining > * [SPARK-14709]: linear SVM > * [SPARK-15784]: Power Iteration Clustering (PIC) > *Lower priority issues* > * Missing methods within algorithms (see Issue Links below) > * evaluation submodule > * stat submodule (should probably be covered in DataFrames) > * Developer-facing submodules: > ** optimization (including [SPARK-17136]) > ** random, rdd > ** util > *To be prioritized* > * single-instance prediction: [SPARK-10413] > * pmml [SPARK-11171] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org