[ https://issues.apache.org/jira/browse/SPARK-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211392#comment-15211392 ]
Yanbo Liang commented on SPARK-13783: ------------------------------------- GBTClassificationModel contains array of DecisionTreeRegressionModel. For import/export, we have two options for discussion: * #1 We iteratively call DecisionTreeRegressionModel.save() to save each DecisionTreeRegressionModel to a folder under "data/tree/" and load iteratively using DecisionTreeRegressionModel.load(). We can reuse all save/load functions of DecisionTree and we can persistent each DecisionTree's params such as "numFeatures” which can be used to reconstruct the DecisionTreeRegressionModel. But in this option, we can not store the GBT model in a single DataFrame. * #2 We known that each DecisionTreeRegressionModel is stored as Seq[NodeData] in a column of DataFrame. We can store GBT as Seq[Seq[NodeData]]. But we can not save the params of each DecisionTreeRegressionModel. If further the DT Model need extra params to reconstruct, we should special handle them. I vote to #1 and looking forward to other comments. [~josephkb] > Model export/import for spark.ml: GBTs > -------------------------------------- > > Key: SPARK-13783 > URL: https://issues.apache.org/jira/browse/SPARK-13783 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Joseph K. Bradley > > This JIRA is for both GBTClassifier and GBTRegressor. The implementation > should reuse the one for DecisionTree*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org