[
https://issues.apache.org/jira/browse/SPARK-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054000#comment-15054000
]
Joseph K. Bradley commented on SPARK-7131:
------------------------------------------
Yes, I'm sorry about how long this has taken, but I have enough confidence in
the API now proceed. I've created a JIRA for doing this in the next release:
[SPARK-12301], though I may not be able to look at this issue until January.
Please post your thoughts there, and ping in early January if there is no
activity. Thank you!
> Move tree,forest implementation from spark.mllib to spark.ml
> ------------------------------------------------------------
>
> Key: SPARK-7131
> URL: https://issues.apache.org/jira/browse/SPARK-7131
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 1.4.0
> Reporter: Joseph K. Bradley
> Assignee: Joseph K. Bradley
> Fix For: 1.5.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> We want to change and improve the spark.ml API for trees and ensembles, but
> we cannot change the old API in spark.mllib. To support the changes we want
> to make, we should move the implementation from spark.mllib to spark.ml. We
> will generalize and modify it, but will also ensure that we do not change the
> behavior of the old API.
> There are several steps to this:
> 1. Copy the implementation over to spark.ml and change the spark.ml classes
> to use that implementation, rather than calling the spark.mllib
> implementation. The current spark.ml tests will ensure that the 2
> implementations learn exactly the same models. Note: This should include
> performance testing to make sure the updated code does not have any
> regressions. --> *UPDATE*: I have run tests using spark-perf, and there were
> no regressions.
> 2. Remove the spark.mllib implementation, and make the spark.mllib APIs
> wrappers around the spark.ml implementation. The spark.ml tests will again
> ensure that we do not change any behavior.
> 3. Move the unit tests to spark.ml, and change the spark.mllib unit tests to
> verify model equivalence.
> This JIRA is now for step 1 only. Steps 2 and 3 will be in separate JIRAs.
> After these updates, we can more safely generalize and improve the spark.ml
> implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]