To the open source C, As it's apparent to me and I believe tacitly admitted by the group in general and heard explicitly in person Any relatively complex PR which involves deeper thinking ( be it functional or performance issue) should be the business of member. If it's performance issue , no way . If it's functional issue which is becoming embarrassment to ignore, somehow ensure that the push happens under a member's PR.
That is how exclusivity and good is to be maintained. On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> wrote: > Based on the data I have and discussed, it's my view that the PRs opened > by you were reactive, happening only after I had opened the initial ticket > and PRs. > You are talking about simplifying the issue > https://github.com/apache/spark/pull/50757#discussion_r2069390537, > I am willing to discuss it here ,over meeting with other members of your > open source group, as to how it simplifies? > > In fact , I had repeatedly said that why are we discussing in internal > channel of company for the PR which I had filed in public Open source . In > that discussion ( the last one, before I was made redundant by company), I > had given detailed explanation of why making each plan node emit > indeterministic is bad idea. ( I would ask you to make that last slack > public, but I am sure that would be an issue as your company policy might > prohibit). > > I understood much earlier why you and your colleague never wanted > technical discussions on my public PRs on PR itself.. > > > > The same holds for other alternate PRs including the issue of "self > joins". > I am willing to discuss it out with your group members, the problem it > solves and what your alternative PR does not. > > > I am not sure if this is generic approach of the "members", to ensure that > final checkin happens under their authorship. > > > > > > > > > On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> wrote: > >> Hi Viquar, >> >> To resolve the immediate discrepancy, I ask that we formally link >>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and >>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>> of both the regression and the baseline fix. This standard attribution >>> costs us nothing but preserves the integrity of our commit history. >> >> >> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree >> it's a fair point to link the tickets and mention Asif's previous work. Let >> me add a comment to both the ticket and the PR. >> >> Conversely, SPARK-56694 bypassed the queue and was merged within eight >>> hours >> >> >> I don't know, is there a queue? As for my work process, when I have some >> time for upstream reviews, I don't follow any queue. I just pick PRs that I >> find interesting or that relate to my experience with Spark. And despite >> its size, https://github.com/apache/spark/pull/55644/changes is >> technically just a one-liner, fairly trivial fix so review within 8 hours >> isn't extraordinary. >> >> Hi Asif, >> >> you opened an alternate PR, which... >>> >> What issue did u see in the logic, that an alternate PR was opened... >> >> >> I think the reason for my simplification approach was discussed both >> offline and online in this thread: >> https://github.com/apache/spark/pull/50757#discussion_r2069390537 >> > > > > >> Best, >> Peter >> >> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I have thoroughly reviewed the technical artifacts surrounding the >>> recent Catalyst optimizer canonicalization discussions to help guide this >>> toward a constructive resolution. >>> >>> We must address a tangible breakdown in our review pipeline. SPARK-45866 >>> and its corresponding PR #49154 correctly identified this complex Catalyst >>> regression in late 2023, yet the ticket remained unaddressed. *Conversely, >>> SPARK-56694 bypassed the queue and was merged within eight hours without >>> referencing the prior art*. Peter has transparently acknowledged the >>> oversight in searching for existing tickets, but we still need to close the >>> loop. >>> >>> To resolve the immediate discrepancy,* I ask that we formally link >>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and >>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>> of both the regression and the baseline fix. This standard attribution >>> costs us nothing but preserves the integrity of our commit history. * >>> >>> Stepping back, this incident highlights a critical systemic risk to our >>> contributor ecosystem. The stark asymmetry in review velocity where an >>> external contributor's highly complex PR sits stagnant for months/years, >>> while an identical internal PR is merged in hours creates visible friction. >>> Even if entirely unintentional due to organizational overload, this pattern >>> discourages the high-level engineering talent required to sustain the >>> project's momentum. >>> >>> To maintain Spark’s technical leadership, we must actively cultivate a >>> culture where contributions are prioritized strictly by their architectural >>> merit, regardless of authorship. Furthermore, we must normalize the habit >>> of proactively acknowledging independent work when parallel discoveries >>> surface. Small, intentional shifts in our governance and review cadence >>> will yield massive dividends in community trust and long-term innovation. >>> >>> Best regards, >>> Viquar Khan >>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true >>> >>> >>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> wrote: >>> >>>> Also I must admit that I did not know oss works by opening alternate >>>> PRs. >>>> In the places where I have worked most of my life, we work on the >>>> opened PR with the original author and try to bridge the gap. >>>> >>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> >>>> wrote: >>>> >>>>> In fact, I showed it not just to you but other colleague of yours too. >>>>> But there has been absolutely no comment or anything on that from then , >>>>> till now. >>>>> >>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]> >>>>> wrote: >>>>> >>>>>> also take a look at this jira >>>>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>>>> for this also an alternate PR was opened. >>>>>> This problem is do deep in code, that I even showed you that in the >>>>>> existing test itself, if the join condition's operand are swapped, test >>>>>> fails.. Its completely broken , the self joins. >>>>>> I had proposed a consistent fix, which solves the issue completely >>>>>> and logically, but again an alternate PR was filed.. >>>>>> What issue was there in my PR , which I created...? >>>>>> Regards >>>>>> Asif >>>>>> >>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> As for the fix, itself, is not indicative of any thing as its a one >>>>>>>>> liner, test has uncanny resemblance >>>>>>>> >>>>>>>> >>>>>>>> Asif, what exactly is the "uncanny resemblance" between those test >>>>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs >>>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the >>>>>>>> fact that obviously they are comparing canonicalized forms. >>>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has >>>>>>>> anything to do with yours. >>>>>>>> >>>>>>> Ok. I respect your opinion. Each one is entitled to its own view >>>>>>> >>>>>>>> >>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- >>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>>>> different route. >>>>>>>> >>>>>>>> >>>>>>>> Do you see anything in common between >>>>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>>>> Because I do see. That someone else had a much better idea: >>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 >>>>>>>> / https://github.com/apache/spark/pull/50230 and it was >>>>>>>> implemented for the benefit of Spark. >>>>>>>> IMO, that's the normal way of dealing with issues in an open-source >>>>>>>> project. Ideas come and go and hopefully the one best wins. >>>>>>>> >>>>>>> The checksum approach has its expense. That can come later , because >>>>>>> apriori its possible to detect whether the expression is returning value >>>>>>> from an indeterministic expression. >>>>>>> You opened an alternate PR, which I have described in the PR >>>>>>> discussion that to fix the round robin issue which you were dealing >>>>>>> with, >>>>>>> you are trying to impose an order in in-deterministic expression >>>>>>> evaluattion, which itself is against the basic premise that if data is >>>>>>> in-determinate, there cannot be order in it. >>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>> opened...which impacted all the stages ( including the ancestors?) and I >>>>>>> already discussed internally why the idea you had in mind would not >>>>>>> work. I >>>>>>> specifically asked, why dont we discuss via the PR filed... >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Nicholas, >>>>>>>>> You wanted some examples , right: >>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- >>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>>>> different route. >>>>>>>>> Did any one who created new PR and route, showed up any >>>>>>>>> unaddressable logical issue? >>>>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>>>> Regards >>>>>>>>> Asif >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I think repeatedly calling the contributors on this list a >>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution. >>>>>>>>>> >>>>>>>>>> You may have some history built up that led you to use that word, >>>>>>>>>> but to the rest of us it comes out of nowhere; you in fact opened >>>>>>>>>> this >>>>>>>>>> thread with that attack. If you keep making your case in this >>>>>>>>>> manner, you >>>>>>>>>> will just turn everyone against you. >>>>>>>>>> >>>>>>>>>> If there is a history of what you feel is others stealing your >>>>>>>>>> work, please link to a few examples so we can see what you are >>>>>>>>>> seeing. If >>>>>>>>>> you can’t do that, then just focus on this current example. And try >>>>>>>>>> to >>>>>>>>>> refrain from calling people names unless your goal is just to have a >>>>>>>>>> fight, >>>>>>>>>> as opposed to resolving the problematic behavior so you can continue >>>>>>>>>> to >>>>>>>>>> contribute. >>>>>>>>>> >>>>>>>>>> I am not a committer and don’t have any special role in this >>>>>>>>>> community. I am speaking just as an observer and regular contributor >>>>>>>>>> to the >>>>>>>>>> project. >>>>>>>>>> >>>>>>>>>> > I have experienced this before, as recent as couple of months >>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>> >>>>>>>>>> For others following along, I took a look at this ticket and the >>>>>>>>>> associated PRs: #53261 >>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100 >>>>>>>>>> <https://github.com/apache/spark/pull/53100> >>>>>>>>>> >>>>>>>>>> It looks like Asif is upset that he submitted a fix for the same >>>>>>>>>> issue a week or so prior to the fix that eventually got merged. But >>>>>>>>>> the >>>>>>>>>> fixes are different, and the one that got merged is a lot shorter, >>>>>>>>>> though >>>>>>>>>> they are both simple. The PR that got merged was submitted by >>>>>>>>>> someone who >>>>>>>>>> appears to be employed by Databricks; perhaps this is part of the >>>>>>>>>> “cartel” >>>>>>>>>> accusation. The two PRs were reviewed by different committers, >>>>>>>>>> however, and >>>>>>>>>> the one that got merged was merged in by someone who does _not_ work >>>>>>>>>> for >>>>>>>>>> Databricks. >>>>>>>>>> >>>>>>>>>> I don’t see anything here other than the normal dynamic of a >>>>>>>>>> large and busy open source project. Committer attention is limited; >>>>>>>>>> things >>>>>>>>>> fall through the cracks; different contributors may occasionally >>>>>>>>>> work on >>>>>>>>>> the same thing without knowing about each other. A minor help to this >>>>>>>>>> specific problem would be to have some way of automatically linking >>>>>>>>>> issues >>>>>>>>>> that appear to be about the same thing. >>>>>>>>>> >>>>>>>>>> Nick >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Peter, >>>>>>>>>> Pls see inline for comments/ replies >>>>>>>>>> >>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey Asif, >>>>>>>>>>> >>>>>>>>>>> Are you referring to >>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are >>>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't >>>>>>>>>>> take any >>>>>>>>>>> code from your PR without consulting with you first. >>>>>>>>>>> >>>>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a >>>>>>>>>> one liner, test has uncanny resemblance. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into >>>>>>>>>>> that minor bug during the implementation of >>>>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>>>> I have experienced this before, as recent as couple of months >>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>> I have experienced, my personal effort ( going into weeks) to >>>>>>>>>> debug, reproduce issue reliably , being hijacked by members, without >>>>>>>>>> even >>>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If interested, >>>>>>>>>> I can >>>>>>>>>> provide details of the PRs / issues I am talking about) >>>>>>>>>> I have seen a perfectly valid PR being nixed , by following >>>>>>>>>> comment which essentially said >>>>>>>>>> " my code of making the cache lookup more effective , would >>>>>>>>>> result in greater chances of stale cache being picked, which >>>>>>>>>> already spark >>>>>>>>>> suffers from." >>>>>>>>>> Now the PR was related to collapsing the projects in analysis >>>>>>>>>> phase, and side effect was cache pick up being more sensitive. >>>>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>>>> "staleness" is an underlying existing issue which had nothing to do >>>>>>>>>> with my >>>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong >>>>>>>>>> result in >>>>>>>>>> millions, that makes all the results a suspect in any case. It does >>>>>>>>>> not >>>>>>>>>> matter at what frequency this occurs. To me the real reason was code >>>>>>>>>> complexity ( & more likely the loss of control of the code to the >>>>>>>>>> outsider). >>>>>>>>>> >>>>>>>>>> The reason I call this open source community as cartel, is >>>>>>>>>> because, I have seen the way it works pretty closely and have >>>>>>>>>> experienced >>>>>>>>>> it in the email exchanges which happen on this group. >>>>>>>>>> For the same PR , same issue, if advertently or inadvertently , >>>>>>>>>> other person ( especially a member) gets his changes pushed, by the >>>>>>>>>> virtue >>>>>>>>>> of his standing/position and the "for profit" company the person >>>>>>>>>> works, how >>>>>>>>>> would you give the credit to the original person who discovered the >>>>>>>>>> issue >>>>>>>>>> first / provided the fix? >>>>>>>>>> Why are issues filed by some immediately worked upon by members ( >>>>>>>>>> some of whom claim to be working full time on spark) ? Is it because >>>>>>>>>> certain companies / groups ( for profit companies, mind you ) exert >>>>>>>>>> undue >>>>>>>>>> control, or the petty newbee has to be in the good books of members >>>>>>>>>> ( with >>>>>>>>>> the hope that at some point they will also reach that position of >>>>>>>>>> power ?) >>>>>>>>>> >>>>>>>>>> Given the AI advent and such occurrences, how will you give due >>>>>>>>>> credit to the original creators and how do you plan to prevent some >>>>>>>>>> member >>>>>>>>>> for taking up idea of any old open PR ( which for reasons of >>>>>>>>>> complexity and >>>>>>>>>> non technical reasons) , polishing it up and pushing it as their >>>>>>>>>> own? >>>>>>>>>> >>>>>>>>>> I am also curious , am I the only one who is troubled by all >>>>>>>>>> this, or there are others who have experienced it? >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Asif >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> If you have further improvements please feel free to open a PR. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> I had filed a bug >>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>>>> >>>>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>>>> >>>>>>>>>>>> Now I see that the ticket I filed is still open, but the issue >>>>>>>>>>>> has been fixed using a new ticket >>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>>>> >>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which in >>>>>>>>>>>> any case would be same) has been taken from my PR for >>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>>>> >>>>>>>>>>>> To me this is clear unethical conduct of cartel member, unless >>>>>>>>>>>> I am missing some valid reason. >>>>>>>>>>>> >>>>>>>>>>>> And the irony is that the fix is still incomplete, as I just >>>>>>>>>>>> found and filed a new ticket >>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>>>> >>>>>>>>>>>> I know that atleast some cartel members are insecure and think >>>>>>>>>>>> of OSS as their fiefdom, but this sort of behaviour , I never >>>>>>>>>>>> expected. >>>>>>>>>>>> Regards >>>>>>>>>>>> Asif >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>
