[
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212428#comment-15212428
]
Joseph K. Bradley commented on SPARK-13783:
-------------------------------------------
I'd prefer what [~GayathriMurali] mentioned; that's what is done in
spark.mllib. That should be more efficient (taking more advantage of columnar
storage).
I do want us to save Params for each tree since that will be more robust to
future code changes (rather than re-creating them based on the GBT params).
However, that may require some code refactoring so that the GBT can get a set
of {{jsonParams}} for each tree. Given that, the GBT could store that JSON in
another DataFrame.
How does that sound?
It may make sense to implement export/import for one ensemble model before the
other since both might require changes to the single-tree save/load. Would you
mind helping to review each other's work? Who would prefer to go first?
Thanks!
> Model export/import for spark.ml: GBTs
> --------------------------------------
>
> Key: SPARK-13783
> URL: https://issues.apache.org/jira/browse/SPARK-13783
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor. The implementation
> should reuse the one for DecisionTree*.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]