Hi Viquar,

To resolve the immediate discrepancy, I ask that we formally link
> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
> add a JIRA comment explicitly crediting Asif as the original co-discoverer
> of both the regression and the baseline fix. This standard attribution
> costs us nothing but preserves the integrity of our commit history.


SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree
it's a fair point to link the tickets and mention Asif's previous work. Let
me add a comment to both the ticket and the PR.

Conversely, SPARK-56694 bypassed the queue and was merged within eight hours


I don't know, is there a queue? As for my work process, when I have some
time for upstream reviews, I don't follow any queue. I just pick PRs that I
find interesting or that relate to my experience with Spark. And despite
its size, https://github.com/apache/spark/pull/55644/changes is technically
just a one-liner, fairly trivial fix so review within 8 hours isn't
extraordinary.

Hi Asif,

you opened an alternate PR, which...
>
What issue did u see in the logic, that an alternate PR was opened...


I think the reason for my simplification approach was discussed both
offline and online in this thread:
https://github.com/apache/spark/pull/50757#discussion_r2069390537

Best,
Peter

On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> wrote:

> Hi all,
>
> I have thoroughly reviewed the technical artifacts surrounding the recent
> Catalyst optimizer canonicalization discussions to help guide this toward a
> constructive resolution.
>
> We must address a tangible breakdown in our review pipeline. SPARK-45866
> and its corresponding PR #49154 correctly identified this complex Catalyst
> regression in late 2023, yet the ticket remained unaddressed. *Conversely,
> SPARK-56694 bypassed the queue and was merged within eight hours without
> referencing the prior art*. Peter has transparently acknowledged the
> oversight in searching for existing tickets, but we still need to close the
> loop.
>
> To resolve the immediate discrepancy,* I ask that we formally link
> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
> add a JIRA comment explicitly crediting Asif as the original co-discoverer
> of both the regression and the baseline fix. This standard attribution
> costs us nothing but preserves the integrity of our commit history.  *
>
> Stepping back, this incident highlights a critical systemic risk to our
> contributor ecosystem. The stark asymmetry in review velocity where an
> external contributor's highly complex PR sits stagnant for months/years,
> while an identical internal PR is merged in hours creates visible friction.
> Even if entirely unintentional due to organizational overload, this pattern
> discourages the high-level engineering talent required to sustain the
> project's momentum.
>
> To maintain Spark’s technical leadership, we must actively cultivate a
> culture where contributions are prioritized strictly by their architectural
> merit, regardless of authorship. Furthermore, we must normalize the habit
> of proactively acknowledging independent work when parallel discoveries
> surface. Small, intentional shifts in our governance and review cadence
> will yield massive dividends in community trust and long-term innovation.
>
> Best regards,
> Viquar Khan
> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true
>
>
> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> wrote:
>
>> Also I must admit that  I did not know oss works by opening alternate PRs.
>> In the places where I have worked most of my life, we work on the opened
>> PR with the original author and try to bridge the gap.
>>
>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> wrote:
>>
>>> In fact, I showed it not just to you but other colleague of yours too.
>>> But there has been absolutely no comment or anything on that  from then ,
>>> till now.
>>>
>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]>
>>> wrote:
>>>
>>>> also take a look at this jira
>>>> https://issues.apache.org/jira/browse/SPARK-47320
>>>> for this also an alternate PR was opened.
>>>> This problem is do deep in code, that I even showed you that in the
>>>> existing test itself, if the join condition's operand are swapped, test
>>>> fails.. Its completely broken , the self joins.
>>>> I had proposed a consistent fix, which solves the issue completely and
>>>> logically, but again an alternate PR was filed..
>>>> What issue was there in my PR , which I created...?
>>>> Regards
>>>> Asif
>>>>
>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> As for the fix, itself, is not indicative of any thing as its a one
>>>>>>> liner, test has uncanny resemblance
>>>>>>
>>>>>>
>>>>>> Asif, what exactly is the "uncanny resemblance" between those test
>>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs
>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the
>>>>>> fact that obviously they are comparing canonicalized forms.
>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has
>>>>>> anything to do with yours.
>>>>>>
>>>>> Ok. I respect your opinion.  Each one is entitled to its own view
>>>>>
>>>>>>
>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3
>>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>> different route.
>>>>>>
>>>>>>
>>>>>> Do you see anything in common between
>>>>>> https://github.com/apache/spark/pull/50029/changes and
>>>>>> https://github.com/apache/spark/pull/50757/changes ?
>>>>>> Because I do see. That someone else had a much better idea:
>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 /
>>>>>> https://github.com/apache/spark/pull/50230 and it was implemented
>>>>>> for the benefit of Spark.
>>>>>> IMO, that's the normal way of dealing with issues in an open-source
>>>>>> project. Ideas come and go and hopefully the one best wins.
>>>>>>
>>>>> The checksum approach has its expense. That can come later , because
>>>>> apriori its possible to detect whether the expression is returning value
>>>>> from an indeterministic expression.
>>>>> You opened an alternate PR, which I have described in the PR
>>>>> discussion that to fix the round robin issue which you were dealing with,
>>>>> you are trying to impose an order in in-deterministic expression
>>>>> evaluattion, which itself is against the basic premise that if data is
>>>>> in-determinate, there cannot be order in it.
>>>>> What issue did u see in the logic, that an alternate PR was
>>>>> opened...which impacted all the stages ( including the ancestors?) and I
>>>>> already discussed internally why the idea you had in mind would not work. 
>>>>> I
>>>>> specifically asked, why dont we discuss via the PR filed...
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Nicholas,
>>>>>>> You wanted some examples , right:
>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3
>>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>> different route.
>>>>>>> Did any one who created new PR and route, showed up any
>>>>>>> unaddressable logical issue?
>>>>>>> The same goes for all the PRs ( which in case I have closed)
>>>>>>> Regards
>>>>>>> Asif
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I think repeatedly calling the contributors on this list a “cartel”
>>>>>>>> is not conducive to a calm and amicable resolution.
>>>>>>>>
>>>>>>>> You may have some history built up that led you to use that word,
>>>>>>>> but to the rest of us it comes out of nowhere; you in fact opened this
>>>>>>>> thread with that attack. If you keep making your case in this manner, 
>>>>>>>> you
>>>>>>>> will just turn everyone against you.
>>>>>>>>
>>>>>>>> If there is a history of what you feel is others stealing your
>>>>>>>> work, please link to a few examples so we can see what you are seeing. 
>>>>>>>> If
>>>>>>>> you can’t do that, then just focus on this current example. And try to
>>>>>>>> refrain from calling people names unless your goal is just to have a 
>>>>>>>> fight,
>>>>>>>> as opposed to resolving the problematic behavior so you can continue to
>>>>>>>> contribute.
>>>>>>>>
>>>>>>>> I am not a committer and don’t have any special role in this
>>>>>>>> community. I am speaking just as an observer and regular contributor 
>>>>>>>> to the
>>>>>>>> project.
>>>>>>>>
>>>>>>>> > I have experienced this before, as recent as couple of months
>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>
>>>>>>>> For others following along, I took a look at this ticket and the
>>>>>>>> associated PRs: #53261 <https://github.com/apache/spark/pull/53261>
>>>>>>>>  / #53100 <https://github.com/apache/spark/pull/53100>
>>>>>>>>
>>>>>>>> It looks like Asif is upset that he submitted a fix for the same
>>>>>>>> issue a week or so prior to the fix that eventually got merged. But the
>>>>>>>> fixes are different, and the one that got merged is a lot shorter, 
>>>>>>>> though
>>>>>>>> they are both simple. The PR that got merged was submitted by someone 
>>>>>>>> who
>>>>>>>> appears to be employed by Databricks; perhaps this is part of the 
>>>>>>>> “cartel”
>>>>>>>> accusation. The two PRs were reviewed by different committers, 
>>>>>>>> however, and
>>>>>>>> the one that got merged was merged in by someone who does _not_ work 
>>>>>>>> for
>>>>>>>> Databricks.
>>>>>>>>
>>>>>>>> I don’t see anything here other than the normal dynamic of a large
>>>>>>>> and busy open source project. Committer attention is limited; things 
>>>>>>>> fall
>>>>>>>> through the cracks; different contributors may occasionally work on the
>>>>>>>> same thing without knowing about each other. A minor help to this 
>>>>>>>> specific
>>>>>>>> problem would be to have some way of automatically linking issues that
>>>>>>>> appear to be about the same thing.
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>>
>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>> Pls see inline for comments/ replies
>>>>>>>>
>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey Asif,
>>>>>>>>>
>>>>>>>>> Are you referring to
>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs.
>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are
>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't 
>>>>>>>>> take any
>>>>>>>>> code from your PR without consulting with you first.
>>>>>>>>>
>>>>>>>>  Yes Indeed Peter, I am referring to those.
>>>>>>>> As for the fix, itself, is not indicative of any thing as its a one
>>>>>>>> liner, test has uncanny resemblance.
>>>>>>>>
>>>>>>>>
>>>>>>>>> As far as I remember, I opened SPARK-56694 /
>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into
>>>>>>>>> that minor bug during the implementation of
>>>>>>>>> https://github.com/apache/spark/pull/55298.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The below I am addressing to the whole cartel.:
>>>>>>>> I have experienced this before, as recent as couple of months back
>>>>>>>> ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>> I have experienced,  my personal effort ( going into weeks) to
>>>>>>>> debug, reproduce issue reliably , being hijacked by members, without 
>>>>>>>> even
>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If interested, I 
>>>>>>>> can
>>>>>>>> provide details of the PRs / issues I am talking about)
>>>>>>>> I have seen a perfectly valid PR being nixed , by following comment
>>>>>>>> which essentially said
>>>>>>>> "  my code of making the cache lookup more effective , would result
>>>>>>>> in greater chances of stale cache being picked,  which already spark
>>>>>>>> suffers from."
>>>>>>>> Now the PR was related to collapsing the projects in analysis
>>>>>>>> phase, and side effect was cache pick up being more sensitive.
>>>>>>>> So this is such a frivolous reason to nix the PR , because
>>>>>>>> "staleness" is an underlying existing issue which had nothing to do 
>>>>>>>> with my
>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong 
>>>>>>>> result in
>>>>>>>> millions, that makes all the results a suspect in any case. It does not
>>>>>>>> matter at what frequency this occurs. To me the real reason was code
>>>>>>>> complexity ( & more likely  the loss of control of the code to the
>>>>>>>> outsider).
>>>>>>>>
>>>>>>>> The reason I call this open source community as cartel, is because,
>>>>>>>> I have seen the way it works pretty closely and have experienced it in 
>>>>>>>> the
>>>>>>>> email exchanges which happen on this group.
>>>>>>>> For the same PR , same issue,  if advertently or inadvertently ,
>>>>>>>> other person ( especially a member) gets his changes pushed, by the 
>>>>>>>> virtue
>>>>>>>> of his standing/position and the "for profit" company the person 
>>>>>>>> works, how
>>>>>>>> would you give the credit to the original person who discovered the 
>>>>>>>> issue
>>>>>>>> first / provided the fix?
>>>>>>>> Why are issues filed by some immediately worked upon by members (
>>>>>>>> some of whom claim to be working full time on spark) ? Is it because
>>>>>>>> certain companies / groups ( for profit companies, mind you )  exert 
>>>>>>>> undue
>>>>>>>> control, or the petty newbee has to be in the good books of members ( 
>>>>>>>> with
>>>>>>>> the hope that at some point they will also reach that position of 
>>>>>>>> power ?)
>>>>>>>>
>>>>>>>> Given the AI advent and such occurrences,  how will you give due
>>>>>>>> credit to the original creators and how do you plan to prevent some 
>>>>>>>> member
>>>>>>>> for taking up idea of any old open PR ( which for reasons of 
>>>>>>>> complexity and
>>>>>>>> non technical reasons) ,  polishing it up and pushing it as their own?
>>>>>>>>
>>>>>>>> I am also curious , am I the only one who is troubled by all this,
>>>>>>>> or there are others who have experienced it?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Asif
>>>>>>>>
>>>>>>>>
>>>>>>>>> If you have further improvements please feel free to open a PR.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I had filed a bug
>>>>>>>>>>  https://issues.apache.org/jira/browse/SPARK-45866
>>>>>>>>>>
>>>>>>>>>> I had also opened a PR for the same.
>>>>>>>>>>
>>>>>>>>>> Now I see that the ticket I  filed is still open, but the issue
>>>>>>>>>> has been fixed using a new ticket
>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694
>>>>>>>>>>
>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which in
>>>>>>>>>> any case would be same) has been taken from my PR for
>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f
>>>>>>>>>>
>>>>>>>>>> To me this is clear unethical conduct of cartel member, unless I
>>>>>>>>>> am missing some valid reason.
>>>>>>>>>>
>>>>>>>>>> And the irony is that the fix is still incomplete, as I just
>>>>>>>>>> found and filed a new ticket
>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126
>>>>>>>>>>
>>>>>>>>>> I know that atleast some cartel members are insecure and think of
>>>>>>>>>> OSS as their fiefdom, but this sort of behaviour , I never expected.
>>>>>>>>>> Regards
>>>>>>>>>> Asif
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>

Reply via email to