[
https://issues.apache.org/jira/browse/SPARK-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley updated SPARK-4591:
-------------------------------------
Description:
This is an umbrella JIRA for porting spark.mllib implementations to use the
DataFrame-based API defined under spark.ml. We want to achieve critical
feature parity for the next release.
h3. Instructions for 3 subtask types
*Review tasks*: detailed review of a subpackage to identify feature gaps
between spark.mllib and spark.ml.
* Should be listed as a subtask of this umbrella.
* Review subtasks cover major algorithm groups. To pick up a review subtask,
please:
** Comment that you are working on it.
** Compare the public APIs of spark.ml vs. spark.mllib.
** Comment on all missing items within spark.ml: algorithms, models, methods,
features, etc.
** Check for existing JIRAs covering those items. If there is no existing
JIRA, create one, and link it to your comment.
*Critical tasks*: higher priority missing features which are required for this
umbrella JIRA.
* Should be linked as "requires" links.
*Other tasks*: lower priority missing features which can be completed after the
critical tasks.
* Should be linked as "related to" links.
h4. Excluded items
This does *not* include Python. We can compare Scala vs. Python in spark.ml
itself.
This also excludes moving linalg to spark.ml: [SPARK-13944]
This does not include the following items (but could eventually):
* Streaming ML
* pmml
was:
This is an umbrella JIRA for porting spark.mllib implementations to use the
DataFrame-based API defined under spark.ml. We want to achieve critical
feature parity for the next release.
h4. 3 subtask types
*Review tasks*: detailed review of a subpackage to identify feature gaps
between spark.mllib and spark.ml.
* Should be listed as a subtask of this umbrella.
* Review subtasks cover major algorithm groups. To pick up a review subtask,
please:
** Comment that you are working on it.
** Compare the public APIs of spark.ml vs. spark.mllib.
** Comment on all missing items within spark.ml: algorithms, models, methods,
features, etc.
** Check for existing JIRAs covering those items. If there is no existing
JIRA, create one, and link it to your comment.
*Critical tasks*: higher priority missing features which are required for this
umbrella JIRA.
* Should be linked as "requires" links.
*Other tasks*: lower priority missing features which can be completed after the
critical tasks.
* Should be linked as "related to" links.
h4. Excluded items
This does *not* include Python. We can compare Scala vs. Python in spark.ml
itself.
This also excludes moving linalg to spark.ml: [SPARK-13944]
This does not include the following items (but could eventually):
* Streaming ML
* pmml
> Algorithm/model parity for spark.ml (Scala)
> -------------------------------------------
>
> Key: SPARK-4591
> URL: https://issues.apache.org/jira/browse/SPARK-4591
> Project: Spark
> Issue Type: Umbrella
> Components: ML
> Reporter: Xiangrui Meng
> Priority: Critical
>
> This is an umbrella JIRA for porting spark.mllib implementations to use the
> DataFrame-based API defined under spark.ml. We want to achieve critical
> feature parity for the next release.
> h3. Instructions for 3 subtask types
> *Review tasks*: detailed review of a subpackage to identify feature gaps
> between spark.mllib and spark.ml.
> * Should be listed as a subtask of this umbrella.
> * Review subtasks cover major algorithm groups. To pick up a review subtask,
> please:
> ** Comment that you are working on it.
> ** Compare the public APIs of spark.ml vs. spark.mllib.
> ** Comment on all missing items within spark.ml: algorithms, models, methods,
> features, etc.
> ** Check for existing JIRAs covering those items. If there is no existing
> JIRA, create one, and link it to your comment.
> *Critical tasks*: higher priority missing features which are required for
> this umbrella JIRA.
> * Should be linked as "requires" links.
> *Other tasks*: lower priority missing features which can be completed after
> the critical tasks.
> * Should be linked as "related to" links.
> h4. Excluded items
> This does *not* include Python. We can compare Scala vs. Python in spark.ml
> itself.
> This also excludes moving linalg to spark.ml: [SPARK-13944]
> This does not include the following items (but could eventually):
> * Streaming ML
> * pmml
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]