FWIW, I am +1 on the proposal (though I missed the vote on this !) Regards, Mridul
On Fri, Mar 14, 2025 at 1:31 AM Mridul Muralidharan <mri...@gmail.com> wrote: > > I agree with Mark, imo this is a qualified veto. > We should give Dongjoon the opportunity to give his clarification, if any. > > I do realize this delays the RC process, but this deserves to be looked > into carefully. > > Thanks, > Mridul > > > On Thu, Mar 13, 2025 at 9:35 PM Mark Hamstra <markhams...@gmail.com> > wrote: > >> Absolutely not! >> >> This is clearly a vote on a code change, not on a procedural issue or >> a package release. The code change has been vetoed by a -1 vote by a >> qualified voter. >> >> On Thu, Mar 13, 2025 at 6:58 PM Jungtaek Lim >> <kabhwan.opensou...@gmail.com> wrote: >> > >> > Likewise I said, I'm concluding the VOTE since we ensure the criteria >> (3 +1 binding, 1 -1 binding, and also +1s from non-binding). >> > >> > I don't consider -1 as a veto as I explained, as we should have >> multiple -1s if we go for VOTE with the current codebase. (+1 in this >> proposal is effectively -1 in another proposal.) >> > >> > The vote followed the Apache Voting Process with the type of "package >> release" (which we tend to use in dev@ for VOTE). I guess it could have >> also done with "procedural issues" which is less strict, but then this >> fulfills both types of votes which should be OK. >> > >> > The current codebase is "accidentally" representing another proposal >> and it is never intended. I don't find the way I can -1 to the current >> codebase, and make a different change neither bound to any proposal to be >> fair. >> > >> > I don't want to block the release because of the above. So, let's >> change the current codebase the way we discussed and voted here. Reverting >> this decision should require another VOTE. >> > >> > Thanks to everyone who voted! >> > >> > On Thu, Mar 13, 2025 at 4:54 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >> >> >> Thanks to everyone who participated and voted! >> >> >> >> Now I can technically conclude the VOTE, but I'm willing to wait till >> US daytime tomorrow, to give some time for Dongjoon to revisit this. >> >> >> >> I'll conclude the vote around 6PM PST tomorrow regardless of his vote. >> It's ideal to see us have no -1, but having one -1 doesn't block this vote >> and we can move forward. >> >> >> >> On Thu, Mar 13, 2025 at 4:42 PM Yang Jie <yangji...@apache.org> wrote: >> >>> >> >>> forgot to mention in my last reply, my stance is +1 >> >>> >> >>> Jie Yang >> >>> >> >>> On 2025/03/13 07:08:12 Russell Jurney wrote: >> >>> > Sure, +1 non-binding. >> >>> > >> >>> > On Wed, Mar 12, 2025 at 11:18 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> >> >>> > wrote: >> >>> > >> >>> > > Russell, >> >>> > > >> >>> > > Of course, we hear people' voices who aren't having binding votes >> as well. >> >>> > > Personally I think it's more important than committers/PMC >> members' VOTE >> >>> > > this time since we can be biased and be far from user experience. >> >>> > > >> >>> > > Could you please explicitly cast your vote, like +1 >> (non-binding)? You >> >>> > > seem to agree with the proposal. Thanks! >> >>> > > >> >>> > > On Thu, Mar 13, 2025 at 3:15 PM Russell Jurney < >> russell.jur...@gmail.com> >> >>> > > wrote: >> >>> > > >> >>> > >> I'm just a lurker and aspiring contributor, but as a Spark user >> upgrading >> >>> > >> twice is very confusing and would cause many or most users to >> fail to >> >>> > >> upgrade successfully to Spark 4 on a first go. That seems like a >> very bad >> >>> > >> user experience. I thought it was worthwhile stating this out >> loud. >> >>> > >> >> >>> > >> Russell >> >>> > >> >> >>> > >> On Wed, Mar 12, 2025 at 11:05 PM Xiao Li <gatorsm...@gmail.com> >> wrote: >> >>> > >> >> >>> > >>> this vote is to allow streaming queries which had been ever run >> in Spark >> >>> > >>>> 3.5.4 to be upgraded with Spark 4.0.x, "without having to be >> upgraded with >> >>> > >>>> Spark 3.5.5+ in prior". >> >>> > >>> >> >>> > >>> >> >>> > >>> In the history of Apache Spark, have we ever required users to >> upgrade >> >>> > >>> to the next maintenance release before moving to a new feature >> or major >> >>> > >>> release? >> >>> > >>> >> >>> > >>> Xiao >> >>> > >>> >> >>> > >>> Adam Binford <adam...@gmail.com> 于2025年3月11日周二 09:08写道: >> >>> > >>> >> >>> > >>>> +1 (non-binding) >> >>> > >>>> >> >>> > >>>> It's a pretty in the weeds issue with how Structured Streaming >> works >> >>> > >>>> under the hood that's kinda hard to understand if you're not >> familiar with >> >>> > >>>> it. The migration logic doesn't mean users can still use the >> old config, >> >>> > >>>> it's purely behind the scenes to fix checkpoint metadata in >> streams created >> >>> > >>>> in 3.5.4. The 5 lines of code it takes to address a weird edge >> case for >> >>> > >>>> certain users that's already gone from master shouldn't be a >> huge deal. >> >>> > >>>> >> >>> > >>>> On Tue, Mar 11, 2025 at 1:43 AM Yang Jie <yangji...@apache.org> >> wrote: >> >>> > >>>> >> >>> > >>>>> >> >>> > >>>>> To Sean, you're right, I'm very sorry. >> >>> > >>>>> >> >>> > >>>>> From the perspective of compatibility and migratability, I >> think we >> >>> > >>>>> should migrate this logic to 4.0.0 and keep it in the >> codebase for a longer >> >>> > >>>>> time (or permanently), because we can't predict which version >> users of >> >>> > >>>>> 3.5.4 will choose next. >> >>> > >>>>> >> >>> > >>>>> >> >>> > >>>>> I don't want to discuss the so-called vendor issue. >> >>> > >>>>> >> >>> > >>>>> I withdraw my previous -1. >> >>> > >>>>> >> >>> > >>>>> Jie Yang. >> >>> > >>>>> >> >>> > >>>>> On 2025/03/11 04:42:25 Wenchen Fan wrote: >> >>> > >>>>> > Guys, let’s be honest about what we’re discussing here. >> >>> > >>>>> > >> >>> > >>>>> > If this is a migration issue, why would we even need a >> vote? We’ve >> >>> > >>>>> been >> >>> > >>>>> > consistently adding configurations to restore legacy >> behavior >> >>> > >>>>> instead of >> >>> > >>>>> > removing them because we understand the challenges of >> upgrading Spark >> >>> > >>>>> > versions. Our goal has always been to make upgrades easier, >> even if >> >>> > >>>>> it >> >>> > >>>>> > means carrying some technical debt. I don’t think we want >> to change >> >>> > >>>>> that >> >>> > >>>>> > culture now. >> >>> > >>>>> > >> >>> > >>>>> > If the concern is about vendor names appearing in the >> codebase, then >> >>> > >>>>> why is >> >>> > >>>>> > it a big deal this time when vendor names are already >> present >> >>> > >>>>> elsewhere? If >> >>> > >>>>> > we’ve failed to follow a policy, let’s correct it, but can >> someone >> >>> > >>>>> point to >> >>> > >>>>> > the specific policy we’re violating? >> >>> > >>>>> > >> >>> > >>>>> > If the vote is about adding migration logic to ease the >> upgrade from >> >>> > >>>>> 3.5.4 >> >>> > >>>>> > to 4.0.0, then +1, why not? >> >>> > >>>>> > >> >>> > >>>>> > Thanks, >> >>> > >>>>> > Wenchen >> >>> > >>>>> > >> >>> > >>>>> > >> >>> > >>>>> > >> >>> > >>>>> > On Mon, Mar 10, 2025 at 8:49 PM Jungtaek Lim < >> >>> > >>>>> kabhwan.opensou...@gmail.com> >> >>> > >>>>> > wrote: >> >>> > >>>>> > >> >>> > >>>>> > > Well said, Sean. Sorry I made you keep around here since >> it might >> >>> > >>>>> not be >> >>> > >>>>> > > clearly stated. My bad. >> >>> > >>>>> > > >> >>> > >>>>> > > Yang, how could we ever tolerate the fact there are >> "other" >> >>> > >>>>> occurrences of >> >>> > >>>>> > > vendor names in the codebase? Please go and search >> "databricks" in >> >>> > >>>>> the >> >>> > >>>>> > > codebase and be surprised. >> >>> > >>>>> > > >> >>> > >>>>> > > If we believe that having vendor names in the codebase >> will >> >>> > >>>>> increase >> >>> > >>>>> > > the occurrence of making mistakes, why didn't we have a >> discussion >> >>> > >>>>> thread >> >>> > >>>>> > > earlier to remove all occurrences altogether? This is >> super tricky >> >>> > >>>>> because >> >>> > >>>>> > > I can even start to argue we have "Apple" as a vendor >> name in >> >>> > >>>>> Apache Spark >> >>> > >>>>> > > codebase. I'm not saying we use "apple" in the test data. >> See >> >>> > >>>>> > > `isMacOnAppleSilicon` in Utils. Is it unavoidable? No, >> >>> > >>>>> `isMacOnMSeries` or >> >>> > >>>>> > > `isMacOnSilicon` is enough. >> >>> > >>>>> > > >> >>> > >>>>> > > We really need to draw a line where we disallow vendor >> names on it >> >>> > >>>>> - if >> >>> > >>>>> > > it's the entire codebase, I don't really think it is >> realistic. >> >>> > >>>>> > > >> >>> > >>>>> > > This was really a mistake, and it was definitely not from >> >>> > >>>>> referring to the >> >>> > >>>>> > > existing codebase. Not having a vendor name does not >> change >> >>> > >>>>> anything on the >> >>> > >>>>> > > chance of encountering this issue again. If we really >> care, we >> >>> > >>>>> should think >> >>> > >>>>> > > about style checking, which is the only viable way to >> catch the >> >>> > >>>>> mistake. >> >>> > >>>>> > > Again, I'd argue we have to have a bunch of vendor names >> in that >> >>> > >>>>> style >> >>> > >>>>> > > check, not just the problematic vendor name. >> >>> > >>>>> > > >> >>> > >>>>> > > >> >>> > >>>>> > > On Tue, Mar 11, 2025 at 12:17 PM Sean Owen < >> sro...@gmail.com> >> >>> > >>>>> wrote: >> >>> > >>>>> > > >> >>> > >>>>> > >> Doesn't the migration code 'clear' the debt? >> >>> > >>>>> > >> The proposal is not to continue to support the config. >> >>> > >>>>> > >> I feel like people are not quite understanding the >> change, and >> >>> > >>>>> objecting >> >>> > >>>>> > >> to something that doesn't exist. >> >>> > >>>>> > >> It's a shame, as this seems like something not even worth >> >>> > >>>>> discussing. I >> >>> > >>>>> > >> don't know why this triggered this much discussion. We >> have kept >> >>> > >>>>> deprecated >> >>> > >>>>> > >> methods without blinking, which is in comparison much >> bigger. >> >>> > >>>>> > >> Can we maybe ask you review the actual change in >> question? >> >>> > >>>>> > >> >> >>> > >>>>> > >> On Mon, Mar 10, 2025, 10:02 PM Yang Jie < >> yangji...@apache.org> >> >>> > >>>>> wrote: >> >>> > >>>>> > >> >> >>> > >>>>> > >>> -1 >> >>> > >>>>> > >>> Remove migration logic of incorrect `spark.databricks.*` >> >>> > >>>>> configuration >> >>> > >>>>> > >>> in Spark 4.0.0 because I think this configuration was >> initially >> >>> > >>>>> introduced >> >>> > >>>>> > >>> accidentally in Spark 3.5.4, lacking a clear design >> intent. >> >>> > >>>>> Although the >> >>> > >>>>> > >>> immediate maintenance cost of retaining this >> configuration >> >>> > >>>>> currently seems >> >>> > >>>>> > >>> limited, as subsequent versions iterate and user habits >> form, it >> >>> > >>>>> may lead >> >>> > >>>>> > >>> to the continuous accumulation of technical debt. When >> users >> >>> > >>>>> come to view >> >>> > >>>>> > >>> this configuration as one that can be relied on >> long-term, >> >>> > >>>>> future removal >> >>> > >>>>> > >>> may face greater resistance from users and could >> potentially >> >>> > >>>>> become an >> >>> > >>>>> > >>> entrenched and redundant configuration in the codebase. >> >>> > >>>>> Therefore, promptly >> >>> > >>>>> > >>> correcting this historically accidental configuration >> not only >> >>> > >>>>> maintains >> >>> > >>>>> > >>> the normativity of the Spark configuration system but >> also >> >>> > >>>>> prevents >> >>> > >>>>> > >>> unintended configurations from becoming de facto >> standards, >> >>> > >>>>> thereby >> >>> > >>>>> > >>> reducing long-term maintenance risks. >> >>> > >>>>> > >>> >> >>> > >>>>> > >>> Jie Yang >> >>> > >>>>> > >>> >> >>> > >>>>> > >>> On 2025/03/10 14:52:52 Dongjoon Hyun wrote: >> >>> > >>>>> > >>> > -1 because there exists a feasible migration path for >> Apache >> >>> > >>>>> Spark >> >>> > >>>>> > >>> 3.5.4 via Apache Spark 3.5.5. >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > It's obvious that this Databricks' mistake already >> causes a >> >>> > >>>>> huge >> >>> > >>>>> > >>> communication cost in the Apache Spark community and is >> >>> > >>>>> suggesting a burden >> >>> > >>>>> > >>> to enforce us to handle at least two more PRs at 4.0.0 >> and 4.1.0. >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > Given that, I don't think >> >>> > >>>>> > >>> > - This is an inevitable or >> >>> > >>>>> > >>> > - This is 0 cost >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > Dongjoon. >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > On 2025/03/10 12:46:16 Jungtaek Lim wrote: >> >>> > >>>>> > >>> > > Starting from my +1 (non-binding). >> >>> > >>>>> > >>> > > >> >>> > >>>>> > >>> > > In addition, I propose to retain migration logic >> till Spark >> >>> > >>>>> 4.1.x and >> >>> > >>>>> > >>> > > remove it in Spark 4.2.0. >> >>> > >>>>> > >>> > > >> >>> > >>>>> > >>> > > On Mon, Mar 10, 2025 at 9:44 PM Jungtaek Lim < >> >>> > >>>>> > >>> kabhwan.opensou...@gmail.com> >> >>> > >>>>> > >>> > > wrote: >> >>> > >>>>> > >>> > > >> >>> > >>>>> > >>> > > > Hi dev, >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > Please vote to retain migration logic of incorrect >> >>> > >>>>> > >>> `spark.databricks.*` >> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x. >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > - DISCUSSION: >> >>> > >>>>> > >>> > > > >> >>> > >>>>> >> https://lists.apache.org/thread/xzk9729lsmo397crdtk14f74g8cyv4sr >> >>> > >>>>> > >>> > > > ([DISCUSS] Handling spark.databricks.* config >> being >> >>> > >>>>> exposed in >> >>> > >>>>> > >>> 3.5.4 in >> >>> > >>>>> > >>> > > > Spark 4.0.0+) >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > Specifically, please review this post >> >>> > >>>>> > >>> > > > >> >>> > >>>>> >> https://lists.apache.org/thread/xtq1kjhsl4ohfon78z3wld2hmfm78t9k >> >>> > >>>>> > >>> which >> >>> > >>>>> > >>> > > > explains pros and cons about the proposal - >> proposal is >> >>> > >>>>> about >> >>> > >>>>> > >>> "Option 1". >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > Simply speaking, this vote is to allow streaming >> queries >> >>> > >>>>> which had >> >>> > >>>>> > >>> been >> >>> > >>>>> > >>> > > > ever run in Spark 3.5.4 to be upgraded with Spark >> 4.0.x, >> >>> > >>>>> "without >> >>> > >>>>> > >>> having to >> >>> > >>>>> > >>> > > > be upgraded with Spark 3.5.5+ in prior". If the >> vote >> >>> > >>>>> passes, we >> >>> > >>>>> > >>> will help >> >>> > >>>>> > >>> > > > users to have a smooth upgrade from Spark 3.5.4 >> to Spark >> >>> > >>>>> 4.0.x, >> >>> > >>>>> > >>> which would >> >>> > >>>>> > >>> > > > be almost 1 year. >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > The (only) cons in this option is having to >> retain the >> >>> > >>>>> incorrect >> >>> > >>>>> > >>> > > > configuration name as "string" in the codebase a >> bit >> >>> > >>>>> longer. The >> >>> > >>>>> > >>> code >> >>> > >>>>> > >>> > > > complexity of migration logic is arguably >> trivial. (link >> >>> > >>>>> > >>> > > > < >> >>> > >>>>> > >>> >> >>> > >>>>> >> https://github.com/apache/spark/blob/4231d58245251a34ae80a38ea4bbf7d720caa439/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L174-L183 >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > > > ) >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > This VOTE is for Spark 4.0.x, but if someone >> supports >> >>> > >>>>> including >> >>> > >>>>> > >>> migration >> >>> > >>>>> > >>> > > > logic to be longer than Spark 4.0.x, please cast >> +1 here >> >>> > >>>>> and leave >> >>> > >>>>> > >>> the >> >>> > >>>>> > >>> > > > desired last minor version of Spark to retain this >> >>> > >>>>> migration logic. >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > The vote is open for the next 72 hours and passes >> if a >> >>> > >>>>> majority +1 >> >>> > >>>>> > >>> PMC >> >>> > >>>>> > >>> > > > votes are cast, with a minimum of 3 +1 votes. >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > [ ] +1 Retain migration logic of incorrect >> >>> > >>>>> `spark.databricks.*` >> >>> > >>>>> > >>> > > > configuration in Spark 4.0.x >> >>> > >>>>> > >>> > > > [ ] -1 Remove migration logic of incorrect >> >>> > >>>>> `spark.databricks.*` >> >>> > >>>>> > >>> > > > configuration in Spark 4.0.0 because... >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > > Thanks! >> >>> > >>>>> > >>> > > > Jungtaek Lim (HeartSaVioR) >> >>> > >>>>> > >>> > > > >> >>> > >>>>> > >>> > > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> >> --------------------------------------------------------------------- >> >>> > >>>>> > >>> > To unsubscribe e-mail: >> dev-unsubscr...@spark.apache.org >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> > >> >>> > >>>>> > >>> >> >>> > >>>>> > >>> >> >>> > >>>>> >> --------------------------------------------------------------------- >> >>> > >>>>> > >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>> > >>>>> > >>> >> >>> > >>>>> > >>> >> >>> > >>>>> > >> >>> > >>>>> >> >>> > >>>>> >> --------------------------------------------------------------------- >> >>> > >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>> > >>>>> >> >>> > >>>>> >> >>> > >>>> >> >>> > >>>> -- >> >>> > >>>> Adam Binford >> >>> > >>>> >> >>> > >>> >> >>> > >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>