Hi dev, This is a spin-up of the original thread "Deprecating and banning `spark.databricks.*` config from Apache Spark repository". (link <https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd>)
>From the original thread, we decided to deprecate the config in Spark 3.5.5 and remove the config in Spark 4.0.0. That thread did not decide one thing, about smooth migration logic. We "persist" the config into offset log for streaming query since the value of the config must be consistent during the lifecycle of the query. This means, the problematic config is already persisted for streaming query which ever ran with Spark 3.5.4. For the migration logic, we re-assign the value of the problematic config to the new config. This happens when the query is restarted, and it will be reflected into an offset log for "newer batch" so after a couple new microbatches the migration logic isn't needed. This migration logic is shipped in Spark 3.5.5, so once the query is run with Spark 3.5.5 for a couple microbatches, it will be mitigated. But I would say that there will always be a case that users just bump the minor/major version without following all the bugfix versions. I think it is still dangerous to remove the migration logic in Spark 4.0.0 (and probably Spark 4.1.0, depending on the discussion). From the migration logic, the problematic config is just a "string", and users wouldn't be able to set the value with the problematic config name. We don't document this, as it'll be done automatically. That said, I'd propose to have migration logic for Spark 4.0 version line (at minimum, 4.1 is debatable). This will give a safer and less burden migration path for users with just retaining a problematic "string" (again, not a config). I'd love to hear the community's voice on this. I'd like to remind you, this is a blocker for Spark 4.0.0. Thanks, Jungtaek Lim (HeartSaVioR)