Based on the data I have and discussed, it's my view that the PRs opened by
you were reactive, happening only after I had opened the initial ticket and
PRs.
You are talking about simplifying the issue
https://github.com/apache/spark/pull/50757#discussion_r2069390537,
I am willing to discuss it here ,over meeting  with other members of your
open source group, as to how it simplifies?

In fact , I had repeatedly said that  why are we discussing in internal
channel of company for the PR which I had filed in public Open source . In
that discussion ( the last one, before I was made redundant by company),  I
had given detailed explanation of why making each plan node emit
indeterministic  is bad idea. ( I would ask you to make that last slack
public, but I am sure that would be an issue as your company policy might
prohibit).

I understood much earlier why you and your colleague never wanted technical
discussions on my  public PRs on PR itself..



The same holds for other alternate PRs including   the issue of "self
joins".
I am willing to discuss it out with your group members, the problem it
solves and what your alternative PR does not.


I am not sure if this is generic approach of the "members", to ensure that
final checkin happens under their authorship.








On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> wrote:

> Hi Viquar,
>
> To resolve the immediate discrepancy, I ask that we formally link
>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
>> add a JIRA comment explicitly crediting Asif as the original co-discoverer
>> of both the regression and the baseline fix. This standard attribution
>> costs us nothing but preserves the integrity of our commit history.
>
>
> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree
> it's a fair point to link the tickets and mention Asif's previous work. Let
> me add a comment to both the ticket and the PR.
>
> Conversely, SPARK-56694 bypassed the queue and was merged within eight
>> hours
>
>
> I don't know, is there a queue? As for my work process, when I have some
> time for upstream reviews, I don't follow any queue. I just pick PRs that I
> find interesting or that relate to my experience with Spark. And despite
> its size, https://github.com/apache/spark/pull/55644/changes is
> technically just a one-liner, fairly trivial fix so review within 8 hours
> isn't extraordinary.
>
> Hi Asif,
>
> you opened an alternate PR, which...
>>
> What issue did u see in the logic, that an alternate PR was opened...
>
>
> I think the reason for my simplification approach was discussed both
> offline and online in this thread:
> https://github.com/apache/spark/pull/50757#discussion_r2069390537
>




> Best,
> Peter
>
> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I have thoroughly reviewed the technical artifacts surrounding the recent
>> Catalyst optimizer canonicalization discussions to help guide this toward a
>> constructive resolution.
>>
>> We must address a tangible breakdown in our review pipeline. SPARK-45866
>> and its corresponding PR #49154 correctly identified this complex Catalyst
>> regression in late 2023, yet the ticket remained unaddressed. *Conversely,
>> SPARK-56694 bypassed the queue and was merged within eight hours without
>> referencing the prior art*. Peter has transparently acknowledged the
>> oversight in searching for existing tickets, but we still need to close the
>> loop.
>>
>> To resolve the immediate discrepancy,* I ask that we formally link
>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
>> add a JIRA comment explicitly crediting Asif as the original co-discoverer
>> of both the regression and the baseline fix. This standard attribution
>> costs us nothing but preserves the integrity of our commit history.  *
>>
>> Stepping back, this incident highlights a critical systemic risk to our
>> contributor ecosystem. The stark asymmetry in review velocity where an
>> external contributor's highly complex PR sits stagnant for months/years,
>> while an identical internal PR is merged in hours creates visible friction.
>> Even if entirely unintentional due to organizational overload, this pattern
>> discourages the high-level engineering talent required to sustain the
>> project's momentum.
>>
>> To maintain Spark’s technical leadership, we must actively cultivate a
>> culture where contributions are prioritized strictly by their architectural
>> merit, regardless of authorship. Furthermore, we must normalize the habit
>> of proactively acknowledging independent work when parallel discoveries
>> surface. Small, intentional shifts in our governance and review cadence
>> will yield massive dividends in community trust and long-term innovation.
>>
>> Best regards,
>> Viquar Khan
>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true
>>
>>
>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> wrote:
>>
>>> Also I must admit that  I did not know oss works by opening alternate
>>> PRs.
>>> In the places where I have worked most of my life, we work on the opened
>>> PR with the original author and try to bridge the gap.
>>>
>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]>
>>> wrote:
>>>
>>>> In fact, I showed it not just to you but other colleague of yours too.
>>>> But there has been absolutely no comment or anything on that  from then ,
>>>> till now.
>>>>
>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]>
>>>> wrote:
>>>>
>>>>> also take a look at this jira
>>>>> https://issues.apache.org/jira/browse/SPARK-47320
>>>>> for this also an alternate PR was opened.
>>>>> This problem is do deep in code, that I even showed you that in the
>>>>> existing test itself, if the join condition's operand are swapped, test
>>>>> fails.. Its completely broken , the self joins.
>>>>> I had proposed a consistent fix, which solves the issue completely and
>>>>> logically, but again an alternate PR was filed..
>>>>> What issue was there in my PR , which I created...?
>>>>> Regards
>>>>> Asif
>>>>>
>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> As for the fix, itself, is not indicative of any thing as its a one
>>>>>>>> liner, test has uncanny resemblance
>>>>>>>
>>>>>>>
>>>>>>> Asif, what exactly is the "uncanny resemblance" between those test
>>>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs
>>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the
>>>>>>> fact that obviously they are comparing canonicalized forms.
>>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has
>>>>>>> anything to do with yours.
>>>>>>>
>>>>>> Ok. I respect your opinion.  Each one is entitled to its own view
>>>>>>
>>>>>>>
>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3
>>>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>>> different route.
>>>>>>>
>>>>>>>
>>>>>>> Do you see anything in common between
>>>>>>> https://github.com/apache/spark/pull/50029/changes and
>>>>>>> https://github.com/apache/spark/pull/50757/changes ?
>>>>>>> Because I do see. That someone else had a much better idea:
>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082
>>>>>>> / https://github.com/apache/spark/pull/50230 and it was implemented
>>>>>>> for the benefit of Spark.
>>>>>>> IMO, that's the normal way of dealing with issues in an open-source
>>>>>>> project. Ideas come and go and hopefully the one best wins.
>>>>>>>
>>>>>> The checksum approach has its expense. That can come later , because
>>>>>> apriori its possible to detect whether the expression is returning value
>>>>>> from an indeterministic expression.
>>>>>> You opened an alternate PR, which I have described in the PR
>>>>>> discussion that to fix the round robin issue which you were dealing with,
>>>>>> you are trying to impose an order in in-deterministic expression
>>>>>> evaluattion, which itself is against the basic premise that if data is
>>>>>> in-determinate, there cannot be order in it.
>>>>>> What issue did u see in the logic, that an alternate PR was
>>>>>> opened...which impacted all the stages ( including the ancestors?) and I
>>>>>> already discussed internally why the idea you had in mind would not 
>>>>>> work. I
>>>>>> specifically asked, why dont we discuss via the PR filed...
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Nicholas,
>>>>>>>> You wanted some examples , right:
>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- 3
>>>>>>>> weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>>> different route.
>>>>>>>> Did any one who created new PR and route, showed up any
>>>>>>>> unaddressable logical issue?
>>>>>>>> The same goes for all the PRs ( which in case I have closed)
>>>>>>>> Regards
>>>>>>>> Asif
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I think repeatedly calling the contributors on this list a
>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution.
>>>>>>>>>
>>>>>>>>> You may have some history built up that led you to use that word,
>>>>>>>>> but to the rest of us it comes out of nowhere; you in fact opened this
>>>>>>>>> thread with that attack. If you keep making your case in this manner, 
>>>>>>>>> you
>>>>>>>>> will just turn everyone against you.
>>>>>>>>>
>>>>>>>>> If there is a history of what you feel is others stealing your
>>>>>>>>> work, please link to a few examples so we can see what you are 
>>>>>>>>> seeing. If
>>>>>>>>> you can’t do that, then just focus on this current example. And try to
>>>>>>>>> refrain from calling people names unless your goal is just to have a 
>>>>>>>>> fight,
>>>>>>>>> as opposed to resolving the problematic behavior so you can continue 
>>>>>>>>> to
>>>>>>>>> contribute.
>>>>>>>>>
>>>>>>>>> I am not a committer and don’t have any special role in this
>>>>>>>>> community. I am speaking just as an observer and regular contributor 
>>>>>>>>> to the
>>>>>>>>> project.
>>>>>>>>>
>>>>>>>>> > I have experienced this before, as recent as couple of months
>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>>
>>>>>>>>> For others following along, I took a look at this ticket and the
>>>>>>>>> associated PRs: #53261
>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100
>>>>>>>>> <https://github.com/apache/spark/pull/53100>
>>>>>>>>>
>>>>>>>>> It looks like Asif is upset that he submitted a fix for the same
>>>>>>>>> issue a week or so prior to the fix that eventually got merged. But 
>>>>>>>>> the
>>>>>>>>> fixes are different, and the one that got merged is a lot shorter, 
>>>>>>>>> though
>>>>>>>>> they are both simple. The PR that got merged was submitted by someone 
>>>>>>>>> who
>>>>>>>>> appears to be employed by Databricks; perhaps this is part of the 
>>>>>>>>> “cartel”
>>>>>>>>> accusation. The two PRs were reviewed by different committers, 
>>>>>>>>> however, and
>>>>>>>>> the one that got merged was merged in by someone who does _not_ work 
>>>>>>>>> for
>>>>>>>>> Databricks.
>>>>>>>>>
>>>>>>>>> I don’t see anything here other than the normal dynamic of a large
>>>>>>>>> and busy open source project. Committer attention is limited; things 
>>>>>>>>> fall
>>>>>>>>> through the cracks; different contributors may occasionally work on 
>>>>>>>>> the
>>>>>>>>> same thing without knowing about each other. A minor help to this 
>>>>>>>>> specific
>>>>>>>>> problem would be to have some way of automatically linking issues that
>>>>>>>>> appear to be about the same thing.
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>> Pls see inline for comments/ replies
>>>>>>>>>
>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Asif,
>>>>>>>>>>
>>>>>>>>>> Are you referring to
>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs.
>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are
>>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't 
>>>>>>>>>> take any
>>>>>>>>>> code from your PR without consulting with you first.
>>>>>>>>>>
>>>>>>>>>  Yes Indeed Peter, I am referring to those.
>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a
>>>>>>>>> one liner, test has uncanny resemblance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> As far as I remember, I opened SPARK-56694 /
>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into
>>>>>>>>>> that minor bug during the implementation of
>>>>>>>>>> https://github.com/apache/spark/pull/55298.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The below I am addressing to the whole cartel.:
>>>>>>>>> I have experienced this before, as recent as couple of months back
>>>>>>>>> ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>> I have experienced,  my personal effort ( going into weeks) to
>>>>>>>>> debug, reproduce issue reliably , being hijacked by members, without 
>>>>>>>>> even
>>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If interested, 
>>>>>>>>> I can
>>>>>>>>> provide details of the PRs / issues I am talking about)
>>>>>>>>> I have seen a perfectly valid PR being nixed , by following
>>>>>>>>> comment which essentially said
>>>>>>>>> "  my code of making the cache lookup more effective , would
>>>>>>>>> result in greater chances of stale cache being picked,  which already 
>>>>>>>>> spark
>>>>>>>>> suffers from."
>>>>>>>>> Now the PR was related to collapsing the projects in analysis
>>>>>>>>> phase, and side effect was cache pick up being more sensitive.
>>>>>>>>> So this is such a frivolous reason to nix the PR , because
>>>>>>>>> "staleness" is an underlying existing issue which had nothing to do 
>>>>>>>>> with my
>>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong 
>>>>>>>>> result in
>>>>>>>>> millions, that makes all the results a suspect in any case. It does 
>>>>>>>>> not
>>>>>>>>> matter at what frequency this occurs. To me the real reason was code
>>>>>>>>> complexity ( & more likely  the loss of control of the code to the
>>>>>>>>> outsider).
>>>>>>>>>
>>>>>>>>> The reason I call this open source community as cartel, is
>>>>>>>>> because, I have seen the way it works pretty closely and have 
>>>>>>>>> experienced
>>>>>>>>> it in the email exchanges which happen on this group.
>>>>>>>>> For the same PR , same issue,  if advertently or inadvertently ,
>>>>>>>>> other person ( especially a member) gets his changes pushed, by the 
>>>>>>>>> virtue
>>>>>>>>> of his standing/position and the "for profit" company the person 
>>>>>>>>> works, how
>>>>>>>>> would you give the credit to the original person who discovered the 
>>>>>>>>> issue
>>>>>>>>> first / provided the fix?
>>>>>>>>> Why are issues filed by some immediately worked upon by members (
>>>>>>>>> some of whom claim to be working full time on spark) ? Is it because
>>>>>>>>> certain companies / groups ( for profit companies, mind you )  exert 
>>>>>>>>> undue
>>>>>>>>> control, or the petty newbee has to be in the good books of members ( 
>>>>>>>>> with
>>>>>>>>> the hope that at some point they will also reach that position of 
>>>>>>>>> power ?)
>>>>>>>>>
>>>>>>>>> Given the AI advent and such occurrences,  how will you give due
>>>>>>>>> credit to the original creators and how do you plan to prevent some 
>>>>>>>>> member
>>>>>>>>> for taking up idea of any old open PR ( which for reasons of 
>>>>>>>>> complexity and
>>>>>>>>> non technical reasons) ,  polishing it up and pushing it as their own?
>>>>>>>>>
>>>>>>>>> I am also curious , am I the only one who is troubled by all this,
>>>>>>>>> or there are others who have experienced it?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Asif
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> If you have further improvements please feel free to open a PR.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I had filed a bug
>>>>>>>>>>>  https://issues.apache.org/jira/browse/SPARK-45866
>>>>>>>>>>>
>>>>>>>>>>> I had also opened a PR for the same.
>>>>>>>>>>>
>>>>>>>>>>> Now I see that the ticket I  filed is still open, but the issue
>>>>>>>>>>> has been fixed using a new ticket
>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694
>>>>>>>>>>>
>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which in
>>>>>>>>>>> any case would be same) has been taken from my PR for
>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f
>>>>>>>>>>>
>>>>>>>>>>> To me this is clear unethical conduct of cartel member, unless I
>>>>>>>>>>> am missing some valid reason.
>>>>>>>>>>>
>>>>>>>>>>> And the irony is that the fix is still incomplete, as I just
>>>>>>>>>>> found and filed a new ticket
>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126
>>>>>>>>>>>
>>>>>>>>>>> I know that atleast some cartel members are insecure and think
>>>>>>>>>>> of OSS as their fiefdom, but this sort of behaviour , I never 
>>>>>>>>>>> expected.
>>>>>>>>>>> Regards
>>>>>>>>>>> Asif
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>

Reply via email to