I don't understand the problem with keeping migration logic in for a long time, just in case. Who cares, it's some bit of check buried somewhere in the streaming code, much like deprecation warnings. There is not somehow an ASF policy compelling the removal of such logic; you are not _required_ to remove deprecated code at major versions or something. Not removing at version 4 doesn't mean you can't remove at version 5, or even 4.2 or whatever.
If there's no valid technical or policy reason offered to prohibit the migration logic, and there's a pretty clear technical reason to keep it (not breaking queries), then I just don't see what the question is. Keep it. On Fri, Mar 7, 2025 at 1:47 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > I have to cast -1 (despite non-binding) for every single RC for Spark > 4.0.0 till this is settled, since I don't agree with the current status > (Dongjoon's proposal is as-is). > > On the other hand, I want to unblock this and stop bugging the RC phase. > Again, I could be only persuaded if this is mandatory with ASF. There are a > couple ways to confirm: > > 1) Escalate this to relevant mailing list in ASF (We will need to figure > out "where", since I haven't had such a thing to be discussed beyond the > project) > 2) Figure out the ASF doc containing the policy of vendor name in the > codebase, and what action PMC members/committers needs to take. > > That should contain evidence that we should pay "any" cost to fix that > e.g. breaking users' queries. It's not justified if it simply says "remove > it". > > I don't know there is 2), so if anyone can find this, that would be > awesome. If anyone can help to figure out where to post to seek for voice > of 1), that would be also awesome. > > We can't wait forever with the above, so if no one has any strong voice on > this, I'm going to start VOTE with my proposal early next week, starting > with retaining migration code in 4.0.x, and 4.1.x if VOTE for 4.0.x > succeeds. > > On Thu, Mar 6, 2025 at 5:54 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> I think it is how to handle the deprecation and removal. >> >> If we leave the migration path for Spark 4.1.x, it will take more than "1 >> year of upgrade path" to be successful. From our release cadence, Spark >> 4.2.0 would probably be released March next year or later. And Spark 3.5.4 >> was released in December last year. Spark 4.0.x may not be very long, but >> still provide 6+ months of upgrade path. >> >> I'm not saying we should leave it forever. I'm saying we should try to >> reduce the probability, likewise how projects handle the deprecation and >> removal while trying to minimize the impact. I see you are predicting the >> target to be not much, but that doesn't justify that we are free to do >> nothing and make them bugged and dissatisfied with the project. >> >> I see this is projected with "security fix" when we talk about severity, >> but security fix does not restrict upgrading path, so what we are about to >> do is much worse than that. I'm trying to make this a lot less worse. >> >> I'm doing my best to care about users. Upgrading is not just a "one >> click", even for bugfix versions. >> >> On Thu, Mar 6, 2025 at 1:56 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Let me reformulate your suggestions and my interpretation. >>> >>> Option 1 "Adding back `spark.databricks.*` in Spark codebase and keep >>> forever" >>> >>> If we follow the proposed logic and reasoning, it means there is no safe >>> version to remove that configuration because Apache Spark 3.5.4 users can >>> jump to any future releases like Spark 4.1.0, 4.2.0, and 5.0.0 technically. >>> In other words, we cannot remove that logic forever. >>> >>> That's the reason why we couldn't make an agreement so far. >>> >>> Option 2 is simply adding a sentence (or more accurate one) for Spark >>> 3.5.4 into the Spark 4.0.0 guideline because all other Spark versions >>> (except 3.5.4) are not contaminated by `spark.databricks.*` conf. >>> >>> "For Spark 3.5.4 streaming jobs, if you want to migrate the existing >>> running jobs, we need to upgrade them to Spark 3.5.5+ before upgrading >>> Spark 4.0" >>> >>> Dongjoon. >>> >>>