Hi Viquar, To resolve the immediate discrepancy, I ask that we formally link > SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and > add a JIRA comment explicitly crediting Asif as the original co-discoverer > of both the regression and the baseline fix. This standard attribution > costs us nothing but preserves the integrity of our commit history.
SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree it's a fair point to link the tickets and mention Asif's previous work. Let me add a comment to both the ticket and the PR. Conversely, SPARK-56694 bypassed the queue and was merged within eight hours I don't know, is there a queue? As for my work process, when I have some time for upstream reviews, I don't follow any queue. I just pick PRs that I find interesting or that relate to my experience with Spark. And despite its size, https://github.com/apache/spark/pull/55644/changes is technically just a one-liner, fairly trivial fix so review within 8 hours isn't extraordinary. Hi Asif, you opened an alternate PR, which... > What issue did u see in the logic, that an alternate PR was opened... I think the reason for my simplification approach was discussed both offline and online in this thread: https://github.com/apache/spark/pull/50757#discussion_r2069390537 Best, Peter On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> wrote: > Hi all, > > I have thoroughly reviewed the technical artifacts surrounding the recent > Catalyst optimizer canonicalization discussions to help guide this toward a > constructive resolution. > > We must address a tangible breakdown in our review pipeline. SPARK-45866 > and its corresponding PR #49154 correctly identified this complex Catalyst > regression in late 2023, yet the ticket remained unaddressed. *Conversely, > SPARK-56694 bypassed the queue and was merged within eight hours without > referencing the prior art*. Peter has transparently acknowledged the > oversight in searching for existing tickets, but we still need to close the > loop. > > To resolve the immediate discrepancy,* I ask that we formally link > SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and > add a JIRA comment explicitly crediting Asif as the original co-discoverer > of both the regression and the baseline fix. This standard attribution > costs us nothing but preserves the integrity of our commit history. * > > Stepping back, this incident highlights a critical systemic risk to our > contributor ecosystem. The stark asymmetry in review velocity where an > external contributor's highly complex PR sits stagnant for months/years, > while an identical internal PR is merged in hours creates visible friction. > Even if entirely unintentional due to organizational overload, this pattern > discourages the high-level engineering talent required to sustain the > project's momentum. > > To maintain Spark’s technical leadership, we must actively cultivate a > culture where contributions are prioritized strictly by their architectural > merit, regardless of authorship. Furthermore, we must normalize the habit > of proactively acknowledging independent work when parallel discoveries > surface. Small, intentional shifts in our governance and review cadence > will yield massive dividends in community trust and long-term innovation. > > Best regards, > Viquar Khan > https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true > > > On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> wrote: > >> Also I must admit that I did not know oss works by opening alternate PRs. >> In the places where I have worked most of my life, we work on the opened >> PR with the original author and try to bridge the gap. >> >> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> wrote: >> >>> In fact, I showed it not just to you but other colleague of yours too. >>> But there has been absolutely no comment or anything on that from then , >>> till now. >>> >>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]> >>> wrote: >>> >>>> also take a look at this jira >>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>> for this also an alternate PR was opened. >>>> This problem is do deep in code, that I even showed you that in the >>>> existing test itself, if the join condition's operand are swapped, test >>>> fails.. Its completely broken , the self joins. >>>> I had proposed a consistent fix, which solves the issue completely and >>>> logically, but again an alternate PR was filed.. >>>> What issue was there in my PR , which I created...? >>>> Regards >>>> Asif >>>> >>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> As for the fix, itself, is not indicative of any thing as its a one >>>>>>> liner, test has uncanny resemblance >>>>>> >>>>>> >>>>>> Asif, what exactly is the "uncanny resemblance" between those test >>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs >>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the >>>>>> fact that obviously they are comparing canonicalized forms. >>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has >>>>>> anything to do with yours. >>>>>> >>>>> Ok. I respect your opinion. Each one is entitled to its own view >>>>> >>>>>> >>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3 >>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>> different route. >>>>>> >>>>>> >>>>>> Do you see anything in common between >>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>> Because I do see. That someone else had a much better idea: >>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 / >>>>>> https://github.com/apache/spark/pull/50230 and it was implemented >>>>>> for the benefit of Spark. >>>>>> IMO, that's the normal way of dealing with issues in an open-source >>>>>> project. Ideas come and go and hopefully the one best wins. >>>>>> >>>>> The checksum approach has its expense. That can come later , because >>>>> apriori its possible to detect whether the expression is returning value >>>>> from an indeterministic expression. >>>>> You opened an alternate PR, which I have described in the PR >>>>> discussion that to fix the round robin issue which you were dealing with, >>>>> you are trying to impose an order in in-deterministic expression >>>>> evaluattion, which itself is against the basic premise that if data is >>>>> in-determinate, there cannot be order in it. >>>>> What issue did u see in the logic, that an alternate PR was >>>>> opened...which impacted all the stages ( including the ancestors?) and I >>>>> already discussed internally why the idea you had in mind would not work. >>>>> I >>>>> specifically asked, why dont we discuss via the PR filed... >>>>> >>>>> >>>>> >>>>>> >>>>>> Peter >>>>>> >>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Nicholas, >>>>>>> You wanted some examples , right: >>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3 >>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>> different route. >>>>>>> Did any one who created new PR and route, showed up any >>>>>>> unaddressable logical issue? >>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>> Regards >>>>>>> Asif >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I think repeatedly calling the contributors on this list a “cartel” >>>>>>>> is not conducive to a calm and amicable resolution. >>>>>>>> >>>>>>>> You may have some history built up that led you to use that word, >>>>>>>> but to the rest of us it comes out of nowhere; you in fact opened this >>>>>>>> thread with that attack. If you keep making your case in this manner, >>>>>>>> you >>>>>>>> will just turn everyone against you. >>>>>>>> >>>>>>>> If there is a history of what you feel is others stealing your >>>>>>>> work, please link to a few examples so we can see what you are seeing. >>>>>>>> If >>>>>>>> you can’t do that, then just focus on this current example. And try to >>>>>>>> refrain from calling people names unless your goal is just to have a >>>>>>>> fight, >>>>>>>> as opposed to resolving the problematic behavior so you can continue to >>>>>>>> contribute. >>>>>>>> >>>>>>>> I am not a committer and don’t have any special role in this >>>>>>>> community. I am speaking just as an observer and regular contributor >>>>>>>> to the >>>>>>>> project. >>>>>>>> >>>>>>>> > I have experienced this before, as recent as couple of months >>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>> >>>>>>>> For others following along, I took a look at this ticket and the >>>>>>>> associated PRs: #53261 <https://github.com/apache/spark/pull/53261> >>>>>>>> / #53100 <https://github.com/apache/spark/pull/53100> >>>>>>>> >>>>>>>> It looks like Asif is upset that he submitted a fix for the same >>>>>>>> issue a week or so prior to the fix that eventually got merged. But the >>>>>>>> fixes are different, and the one that got merged is a lot shorter, >>>>>>>> though >>>>>>>> they are both simple. The PR that got merged was submitted by someone >>>>>>>> who >>>>>>>> appears to be employed by Databricks; perhaps this is part of the >>>>>>>> “cartel” >>>>>>>> accusation. The two PRs were reviewed by different committers, >>>>>>>> however, and >>>>>>>> the one that got merged was merged in by someone who does _not_ work >>>>>>>> for >>>>>>>> Databricks. >>>>>>>> >>>>>>>> I don’t see anything here other than the normal dynamic of a large >>>>>>>> and busy open source project. Committer attention is limited; things >>>>>>>> fall >>>>>>>> through the cracks; different contributors may occasionally work on the >>>>>>>> same thing without knowing about each other. A minor help to this >>>>>>>> specific >>>>>>>> problem would be to have some way of automatically linking issues that >>>>>>>> appear to be about the same thing. >>>>>>>> >>>>>>>> Nick >>>>>>>> >>>>>>>> >>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Peter, >>>>>>>> Pls see inline for comments/ replies >>>>>>>> >>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey Asif, >>>>>>>>> >>>>>>>>> Are you referring to >>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are >>>>>>>>> definitely solving the same issue but I can assure you I wouldn't >>>>>>>>> take any >>>>>>>>> code from your PR without consulting with you first. >>>>>>>>> >>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>> As for the fix, itself, is not indicative of any thing as its a one >>>>>>>> liner, test has uncanny resemblance. >>>>>>>> >>>>>>>> >>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into >>>>>>>>> that minor bug during the implementation of >>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed. >>>>>>>>> >>>>>>>> >>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>> I have experienced this before, as recent as couple of months back >>>>>>>> ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>> I have experienced, my personal effort ( going into weeks) to >>>>>>>> debug, reproduce issue reliably , being hijacked by members, without >>>>>>>> even >>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If interested, I >>>>>>>> can >>>>>>>> provide details of the PRs / issues I am talking about) >>>>>>>> I have seen a perfectly valid PR being nixed , by following comment >>>>>>>> which essentially said >>>>>>>> " my code of making the cache lookup more effective , would result >>>>>>>> in greater chances of stale cache being picked, which already spark >>>>>>>> suffers from." >>>>>>>> Now the PR was related to collapsing the projects in analysis >>>>>>>> phase, and side effect was cache pick up being more sensitive. >>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>> "staleness" is an underlying existing issue which had nothing to do >>>>>>>> with my >>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong >>>>>>>> result in >>>>>>>> millions, that makes all the results a suspect in any case. It does not >>>>>>>> matter at what frequency this occurs. To me the real reason was code >>>>>>>> complexity ( & more likely the loss of control of the code to the >>>>>>>> outsider). >>>>>>>> >>>>>>>> The reason I call this open source community as cartel, is because, >>>>>>>> I have seen the way it works pretty closely and have experienced it in >>>>>>>> the >>>>>>>> email exchanges which happen on this group. >>>>>>>> For the same PR , same issue, if advertently or inadvertently , >>>>>>>> other person ( especially a member) gets his changes pushed, by the >>>>>>>> virtue >>>>>>>> of his standing/position and the "for profit" company the person >>>>>>>> works, how >>>>>>>> would you give the credit to the original person who discovered the >>>>>>>> issue >>>>>>>> first / provided the fix? >>>>>>>> Why are issues filed by some immediately worked upon by members ( >>>>>>>> some of whom claim to be working full time on spark) ? Is it because >>>>>>>> certain companies / groups ( for profit companies, mind you ) exert >>>>>>>> undue >>>>>>>> control, or the petty newbee has to be in the good books of members ( >>>>>>>> with >>>>>>>> the hope that at some point they will also reach that position of >>>>>>>> power ?) >>>>>>>> >>>>>>>> Given the AI advent and such occurrences, how will you give due >>>>>>>> credit to the original creators and how do you plan to prevent some >>>>>>>> member >>>>>>>> for taking up idea of any old open PR ( which for reasons of >>>>>>>> complexity and >>>>>>>> non technical reasons) , polishing it up and pushing it as their own? >>>>>>>> >>>>>>>> I am also curious , am I the only one who is troubled by all this, >>>>>>>> or there are others who have experienced it? >>>>>>>> >>>>>>>> Regards >>>>>>>> Asif >>>>>>>> >>>>>>>> >>>>>>>>> If you have further improvements please feel free to open a PR. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> I had filed a bug >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>> >>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>> >>>>>>>>>> Now I see that the ticket I filed is still open, but the issue >>>>>>>>>> has been fixed using a new ticket >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>> >>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which in >>>>>>>>>> any case would be same) has been taken from my PR for >>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>> >>>>>>>>>> To me this is clear unethical conduct of cartel member, unless I >>>>>>>>>> am missing some valid reason. >>>>>>>>>> >>>>>>>>>> And the irony is that the fix is still incomplete, as I just >>>>>>>>>> found and filed a new ticket >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>> >>>>>>>>>> I know that atleast some cartel members are insecure and think of >>>>>>>>>> OSS as their fiefdom, but this sort of behaviour , I never expected. >>>>>>>>>> Regards >>>>>>>>>> Asif >>>>>>>>>> >>>>>>>>> >>>>>>>>
