[ 
https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211392#comment-15211392
 ] 

Yanbo Liang commented on SPARK-13783:
-------------------------------------

GBTClassificationModel contains array of DecisionTreeRegressionModel. For 
import/export, we have two options for discussion:

* #1 We iteratively call DecisionTreeRegressionModel.save() to save each 
DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively 
using DecisionTreeRegressionModel.load(). We can reuse all save/load functions 
of DecisionTree and we can persistent each DecisionTree's params such as 
"numFeatures” which can be used to reconstruct the DecisionTreeRegressionModel. 
But in this option, we can not store the GBT model in a single DataFrame.

* #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] 
in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can 
not save the params of each DecisionTreeRegressionModel. If further the DT 
Model need extra params to reconstruct, we should special handle them.
I vote to #1 and looking forward to other comments. [~josephkb]

> Model export/import for spark.ml: GBTs
> --------------------------------------
>
>                 Key: SPARK-13783
>                 URL: https://issues.apache.org/jira/browse/SPARK-13783
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for both GBTClassifier and GBTRegressor.  The implementation 
> should reuse the one for DecisionTree*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to