[ https://issues.apache.org/jira/browse/SPARK-29235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Bedford updated SPARK-29235: ------------------------------------ Description: Right after a CrossValidatorModel is trained, it has avgMetrics. After the model is written to disk and read later, it no longer has avgMetrics. To reproduce: {{from pyspark.ml.tuning import CrossValidator, CrossValidatorModel}} {{}} {{cv = CrossValidator(...) #fill with params}} {{cvModel = cv.fit(trainDF) #given dataframe with training}} {{data}}{{print(cvModel.avgMetrics) #prints a nonempty list as expected}} {{cvModel.write().save({color:#172b4d}"/tmp/model"{color})}} {{cvModel2 = CrossValidatorModel.read().load({color:#172b4d}"/tmp/model"{color})}}{{print(cvModel2.avgMetrics) #BUG - prints an empty list}} was: Right after a CrossValidatorModel is trained, it has avgMetrics. After the model is written to disk and read later, it no longer has avgMetrics. To reproduce: {{from pyspark.ml.tuning import CrossValidator, CrossValidatorModel }}{{}} {{cv = CrossValidator(...) #fill with params }}{{}} {{cvModel = cv.fit(trainDF) #given dataframe with training}} {{data}}{{print(cvModel.avgMetrics) #prints a nonempty list as expected}} {{cvModel.write().save({color:#172b4d}"/tmp/model"{color})}} {{cvModel2 = CrossValidatorModel.read().load({color:#172b4d}{color:#172b4d}"/tmp/model"{color}{color})}}{{{color:#172b4d}print(cvModel2.avgMetrics) #BUG - prints an empty list{color}}} > CrossValidatorModel.avgMetrics disappears after model is written/read again > --------------------------------------------------------------------------- > > Key: SPARK-29235 > URL: https://issues.apache.org/jira/browse/SPARK-29235 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.1 > Environment: Databricks cluster: > { > "num_workers": 4, > "cluster_name": "mabedfor-test-classfix", > "spark_version": "5.3.x-cpu-ml-scala2.11", > "spark_conf": { > "spark.databricks.delta.preview.enabled": "true" > }, > "node_type_id": "Standard_DS12_v2", > "driver_node_type_id": "Standard_DS12_v2", > "ssh_public_keys": [], > "custom_tags": {}, > "spark_env_vars": { > "PYSPARK_PYTHON": "/databricks/python3/bin/python3" > }, > "autotermination_minutes": 120, > "enable_elastic_disk": true, > "cluster_source": "UI", > "init_scripts": [], > "cluster_id": "0722-165622-calls746" > } > Reporter: Matthew Bedford > Priority: Minor > > > Right after a CrossValidatorModel is trained, it has avgMetrics. After the > model is written to disk and read later, it no longer has avgMetrics. To > reproduce: > {{from pyspark.ml.tuning import CrossValidator, CrossValidatorModel}} > {{}} > {{cv = CrossValidator(...) #fill with params}} > {{cvModel = cv.fit(trainDF) #given dataframe with training}} > {{data}}{{print(cvModel.avgMetrics) #prints a nonempty list as expected}} > {{cvModel.write().save({color:#172b4d}"/tmp/model"{color})}} > {{cvModel2 = > CrossValidatorModel.read().load({color:#172b4d}"/tmp/model"{color})}}{{print(cvModel2.avgMetrics) > #BUG - prints an empty list}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org