[ https://issues.apache.org/jira/browse/SPARK-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Cutler updated SPARK-27293: --------------------------------- Summary: Setting random seed produces different results in RandomForestRegressor (was: I am interested in finding out if there is a bug in the implementation of RandomForests. The Issue is when applying a seed and getting different results than other people from my class when applying it to the same data. ) > Setting random seed produces different results in RandomForestRegressor > ----------------------------------------------------------------------- > > Key: SPARK-27293 > URL: https://issues.apache.org/jira/browse/SPARK-27293 > Project: Spark > Issue Type: Question > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Martin Skauen > Priority: Major > > I am calculating the RMSE metric like this: > {code:java} > (trainingData, testData) = data.randomSplit([0.7, 0.3], 313) > from pyspark.ml.regression import RandomForestRegressor > rfr = RandomForestRegressor(labelCol="labels", featuresCol="features", > maxDepth=5, numTrees=3, seed = 313) > from pyspark.ml.evaluation import RegressionEvaluator > evaluator = RegressionEvaluator\ > (labelCol="labels", predictionCol="prediction", metricName="rmse") > rmse = evaluator.evaluate(predictions) > print("RMSE = %g " % rmse) > {code} > I am setting the seed. For seed = 50 and also for other seeds I get exact > same RMSE as people from class. I set seed to 313 and it is giving me > different value. What could be the issue here? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org