[
https://issues.apache.org/jira/browse/SPARK-45154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795716#comment-17795716
]
APeng Zhang commented on SPARK-45154:
-------------------------------------
[~oumarnour] I think you need to set the _seed_ param of CrossValidator.
> Pyspark DecisionTreeClassifier: results and tree structure in spark3 very
> different from that of the spark2 version on the same data and with the same
> hyperparameters.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-45154
> URL: https://issues.apache.org/jira/browse/SPARK-45154
> Project: Spark
> Issue Type: Bug
> Components: ML, MLlib, PySpark, Spark Core
> Affects Versions: 3.0.0, 3.3.1, 3.2.4, 3.3.3, 3.3.2, 3.4.0, 3.4.1
> Reporter: Oumar Nour
> Priority: Critical
> Labels: decisiontree, pyspark3, spark2, spark3
>
> Hello,
> I have an engine running on spark2 using a DecisionTreeClassifier model using
> the CrossValidator.
>
> {code:java}
> dt = DecisionTreeClassifier(maxBins=10000, seed=0)
> cv_dt_evaluator = BinaryClassificationEvaluator(
> metricName="",
> rawPredictionCol="probability")
> # Create param grid and cross validator for model selection
> dt_grid = ParamGridBuilder()\
> .addGrid(
> dt.minInstancesPerNode, [100]
> )\
> .addGrid(
> dt.maxDepth, [10]
> )\
> .build()
> cv = CrossValidator(
> estimator=dt, estimatorParamMaps=dt_grid,
> evaluator=cv_dt_evaluator,
> parallelism=4
> numFolds=4
> ){code}
>
> I want to {*}migrate from spark2 to spark3{*}. I've run
> *DecisionTreeClassifier* on the same data with the same parameter values. But
> unfortunately my results are {*}completely different, especially in terms of
> tree structure{*}. I have trees with less depth and fewer splits on spark3.
> I've tried to read the documentation but I haven't found an answer to my
> question.
>
> Can you help me find a solution to this problem?
> Thanks in advance for your help
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]