[ https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weichen Xu reassigned SPARK-33592: ---------------------------------- Assignee: Weichen Xu > Pyspark ML Validator writer may lost params in estimatorParamMaps > ----------------------------------------------------------------- > > Key: SPARK-33592 > URL: https://issues.apache.org/jira/browse/SPARK-33592 > Project: Spark > Issue Type: Bug > Components: ML, PySpark > Affects Versions: 3.0.0, 3.1.0 > Reporter: Weichen Xu > Assignee: Weichen Xu > Priority: Major > > Two typical cases to reproduce it: > (1) > {code:python} > tokenizer = Tokenizer(inputCol="text", outputCol="words") > hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") > lr = LogisticRegression() > pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) > paramGrid = ParamGridBuilder() \ > .addGrid(hashingTF.numFeatures, [10, 100]) \ > .addGrid(lr.maxIter, [100, 200]) \ > .build() > tvs = TrainValidationSplit(estimator=pipeline, > estimatorParamMaps=paramGrid, > evaluator=MulticlassClassificationEvaluator()) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params > `hashingTF.numFeatures` and `lr.maxIter` are lost. > (2) > {code:python} > lr = LogisticRegression() > ova = OneVsRest(classifier=lr) > grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build() > evaluator = MulticlassClassificationEvaluator() > tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, > evaluator=evaluator) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning > params`lr.maxIter` are lost. > Both CrossValidator and TrainValidationSplit has this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org