Technically, there is no agreement here. In other words, we have the same situation with the initial discussion thread where we couldn't build a community consensus on this.
> I will consider this as "lazy consensus" if there are no objections > for 3 days from initiation of the thread. If you need an explicit veto, here is mine, -1, because I don't think that's just a string. > the problematic config is just a "string", To be clear, as I proposed both in the PR comments and initial discussion thread, I believe we had better keep the AS-IS `master` and `branch-4.0` and recommend to upgrade to the latest version of Apache Spark 3.5.x first before upgrading to Spark 4. Sincerely, Dongjoon. On Tue, Mar 4, 2025 at 8:37 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Bumping on this. Again, this is a blocker for Spark 4.0.0. I will consider > this as "lazy consensus" if there are no objections for 3 days from > initiation of the thread. > > On Tue, Mar 4, 2025 at 2:15 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> Hi dev, >> >> This is a spin-up of the original thread "Deprecating and banning >> `spark.databricks.*` config from Apache Spark repository". (link >> <https://lists.apache.org/thread/qwxb21g5xjl7xfp4rozqmg1g0ndfw2jd>) >> >> From the original thread, we decided to deprecate the config in Spark >> 3.5.5 and remove the config in Spark 4.0.0. That thread did not decide one >> thing, about smooth migration logic. >> >> We "persist" the config into offset log for streaming query since the >> value of the config must be consistent during the lifecycle of the query. >> This means, the problematic config is already persisted for streaming query >> which ever ran with Spark 3.5.4. >> >> For the migration logic, we re-assign the value of the problematic config >> to the new config. This happens when the query is restarted, and it will be >> reflected into an offset log for "newer batch" so after a couple new >> microbatches the migration logic isn't needed. This migration logic is >> shipped in Spark 3.5.5, so once the query is run with Spark 3.5.5 for a >> couple microbatches, it will be mitigated. >> >> But I would say that there will always be a case that users just bump the >> minor/major version without following all the bugfix versions. I think it >> is still dangerous to remove the migration logic in Spark 4.0.0 (and >> probably Spark 4.1.0, depending on the discussion). From the migration >> logic, the problematic config is just a "string", and users wouldn't be >> able to set the value with the problematic config name. We don't document >> this, as it'll be done automatically. >> >> That said, I'd propose to have migration logic for Spark 4.0 version line >> (at minimum, 4.1 is debatable). This will give a safer and less burden >> migration path for users with just retaining a problematic "string" (again, >> not a config). >> >> I'd love to hear the community's voice on this. I'd like to remind you, >> this is a blocker for Spark 4.0.0. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >