Ian Markowitz created SPARK-53507: ------------------------------------- Summary: Add Breaking Change info to Spark error classes Key: SPARK-53507 URL: https://issues.apache.org/jira/browse/SPARK-53507 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 4.1.0 Reporter: Ian Markowitz
Users of Apache Spark often have their jobs break when upgrading to a new version. We'd like to improve this using config flags and a concept called "Breaking Change Info". This is an example of a breaking change: - Since Spark 4.1, `mapInPandas` and `mapInArrow` enforces strict validation of the result against the schema. The column names must exactly match and types must match with compatible nullability. To restore the previous behavior, set `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`. This can be mitigated as follows: * When the breaking change is created, we define an error class with a `breakingChangeInfo` object. This includes a message, a spark config, and a flag indicating if the mitigation could be applied automatically. Example: ``` "MAP_VALIDATION_ERROR": { "message": [ "Result validation failed: The schema does not match the expected schema.", ], "breakingChangeInfo": { "migrationMessage": [ "To disable strict result validation, set set `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`" ], "mitigationSparkConfig": { "key": "spark.sql.execution.arrow.pyspark.validateSchema.enabled", "value": "false" }, "autoMitigation": true } } ``` * In the Spark code, when this particular breaking change is hit, we always throw an error with the matching error class. * A platform running the spark job can handle this error by re-running this job with the specified config applied. This enables us to automatically, successfully retry the job with the breaking change mitigated. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org