To the open source C,
As it's apparent to me and I believe tacitly admitted by the group in
general and heard explicitly in person
Any relatively complex PR which involves deeper thinking ( be it functional
or performance issue) should be the business of member.
If it's performance issue , no way .
If it's functional issue which is becoming embarrassment to ignore, somehow
ensure that the push happens under a member's PR.

That is how exclusivity and good is to be maintained.


On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> wrote:

> Based on the data I have and discussed, it's my view that the PRs opened
> by you were reactive, happening only after I had opened the initial ticket
> and PRs.
> You are talking about simplifying the issue
> https://github.com/apache/spark/pull/50757#discussion_r2069390537,
> I am willing to discuss it here ,over meeting  with other members of your
> open source group, as to how it simplifies?
>
> In fact , I had repeatedly said that  why are we discussing in internal
> channel of company for the PR which I had filed in public Open source . In
> that discussion ( the last one, before I was made redundant by company),  I
> had given detailed explanation of why making each plan node emit
> indeterministic  is bad idea. ( I would ask you to make that last slack
> public, but I am sure that would be an issue as your company policy might
> prohibit).
>
> I understood much earlier why you and your colleague never wanted
> technical discussions on my  public PRs on PR itself..
>
>
>
> The same holds for other alternate PRs including   the issue of "self
> joins".
> I am willing to discuss it out with your group members, the problem it
> solves and what your alternative PR does not.
>
>
> I am not sure if this is generic approach of the "members", to ensure that
> final checkin happens under their authorship.
>
>
>
>
>
>
>
>
> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> wrote:
>
>> Hi Viquar,
>>
>> To resolve the immediate discrepancy, I ask that we formally link
>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer
>>> of both the regression and the baseline fix. This standard attribution
>>> costs us nothing but preserves the integrity of our commit history.
>>
>>
>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree
>> it's a fair point to link the tickets and mention Asif's previous work. Let
>> me add a comment to both the ticket and the PR.
>>
>> Conversely, SPARK-56694 bypassed the queue and was merged within eight
>>> hours
>>
>>
>> I don't know, is there a queue? As for my work process, when I have some
>> time for upstream reviews, I don't follow any queue. I just pick PRs that I
>> find interesting or that relate to my experience with Spark. And despite
>> its size, https://github.com/apache/spark/pull/55644/changes is
>> technically just a one-liner, fairly trivial fix so review within 8 hours
>> isn't extraordinary.
>>
>> Hi Asif,
>>
>> you opened an alternate PR, which...
>>>
>> What issue did u see in the logic, that an alternate PR was opened...
>>
>>
>> I think the reason for my simplification approach was discussed both
>> offline and online in this thread:
>> https://github.com/apache/spark/pull/50757#discussion_r2069390537
>>
>
>
>
>
>> Best,
>> Peter
>>
>> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have thoroughly reviewed the technical artifacts surrounding the
>>> recent Catalyst optimizer canonicalization discussions to help guide this
>>> toward a constructive resolution.
>>>
>>> We must address a tangible breakdown in our review pipeline. SPARK-45866
>>> and its corresponding PR #49154 correctly identified this complex Catalyst
>>> regression in late 2023, yet the ticket remained unaddressed. *Conversely,
>>> SPARK-56694 bypassed the queue and was merged within eight hours without
>>> referencing the prior art*. Peter has transparently acknowledged the
>>> oversight in searching for existing tickets, but we still need to close the
>>> loop.
>>>
>>> To resolve the immediate discrepancy,* I ask that we formally link
>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and
>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer
>>> of both the regression and the baseline fix. This standard attribution
>>> costs us nothing but preserves the integrity of our commit history.  *
>>>
>>> Stepping back, this incident highlights a critical systemic risk to our
>>> contributor ecosystem. The stark asymmetry in review velocity where an
>>> external contributor's highly complex PR sits stagnant for months/years,
>>> while an identical internal PR is merged in hours creates visible friction.
>>> Even if entirely unintentional due to organizational overload, this pattern
>>> discourages the high-level engineering talent required to sustain the
>>> project's momentum.
>>>
>>> To maintain Spark’s technical leadership, we must actively cultivate a
>>> culture where contributions are prioritized strictly by their architectural
>>> merit, regardless of authorship. Furthermore, we must normalize the habit
>>> of proactively acknowledging independent work when parallel discoveries
>>> surface. Small, intentional shifts in our governance and review cadence
>>> will yield massive dividends in community trust and long-term innovation.
>>>
>>> Best regards,
>>> Viquar Khan
>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true
>>>
>>>
>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> wrote:
>>>
>>>> Also I must admit that  I did not know oss works by opening alternate
>>>> PRs.
>>>> In the places where I have worked most of my life, we work on the
>>>> opened PR with the original author and try to bridge the gap.
>>>>
>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]>
>>>> wrote:
>>>>
>>>>> In fact, I showed it not just to you but other colleague of yours too.
>>>>> But there has been absolutely no comment or anything on that  from then ,
>>>>> till now.
>>>>>
>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> also take a look at this jira
>>>>>> https://issues.apache.org/jira/browse/SPARK-47320
>>>>>> for this also an alternate PR was opened.
>>>>>> This problem is do deep in code, that I even showed you that in the
>>>>>> existing test itself, if the join condition's operand are swapped, test
>>>>>> fails.. Its completely broken , the self joins.
>>>>>> I had proposed a consistent fix, which solves the issue completely
>>>>>> and logically, but again an alternate PR was filed..
>>>>>> What issue was there in my PR , which I created...?
>>>>>> Regards
>>>>>> Asif
>>>>>>
>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> As for the fix, itself, is not indicative of any thing as its a one
>>>>>>>>> liner, test has uncanny resemblance
>>>>>>>>
>>>>>>>>
>>>>>>>> Asif, what exactly is the "uncanny resemblance" between those test
>>>>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs
>>>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the
>>>>>>>> fact that obviously they are comparing canonicalized forms.
>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has
>>>>>>>> anything to do with yours.
>>>>>>>>
>>>>>>> Ok. I respect your opinion.  Each one is entitled to its own view
>>>>>>>
>>>>>>>>
>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2-
>>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>>>> different route.
>>>>>>>>
>>>>>>>>
>>>>>>>> Do you see anything in common between
>>>>>>>> https://github.com/apache/spark/pull/50029/changes and
>>>>>>>> https://github.com/apache/spark/pull/50757/changes ?
>>>>>>>> Because I do see. That someone else had a much better idea:
>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082
>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was
>>>>>>>> implemented for the benefit of Spark.
>>>>>>>> IMO, that's the normal way of dealing with issues in an open-source
>>>>>>>> project. Ideas come and go and hopefully the one best wins.
>>>>>>>>
>>>>>>> The checksum approach has its expense. That can come later , because
>>>>>>> apriori its possible to detect whether the expression is returning value
>>>>>>> from an indeterministic expression.
>>>>>>> You opened an alternate PR, which I have described in the PR
>>>>>>> discussion that to fix the round robin issue which you were dealing 
>>>>>>> with,
>>>>>>> you are trying to impose an order in in-deterministic expression
>>>>>>> evaluattion, which itself is against the basic premise that if data is
>>>>>>> in-determinate, there cannot be order in it.
>>>>>>> What issue did u see in the logic, that an alternate PR was
>>>>>>> opened...which impacted all the stages ( including the ancestors?) and I
>>>>>>> already discussed internally why the idea you had in mind would not 
>>>>>>> work. I
>>>>>>> specifically asked, why dont we discuss via the PR filed...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Nicholas,
>>>>>>>>> You wanted some examples , right:
>>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2-
>>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a
>>>>>>>>> different route.
>>>>>>>>> Did any one who created new PR and route, showed up any
>>>>>>>>> unaddressable logical issue?
>>>>>>>>> The same goes for all the PRs ( which in case I have closed)
>>>>>>>>> Regards
>>>>>>>>> Asif
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I think repeatedly calling the contributors on this list a
>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution.
>>>>>>>>>>
>>>>>>>>>> You may have some history built up that led you to use that word,
>>>>>>>>>> but to the rest of us it comes out of nowhere; you in fact opened 
>>>>>>>>>> this
>>>>>>>>>> thread with that attack. If you keep making your case in this 
>>>>>>>>>> manner, you
>>>>>>>>>> will just turn everyone against you.
>>>>>>>>>>
>>>>>>>>>> If there is a history of what you feel is others stealing your
>>>>>>>>>> work, please link to a few examples so we can see what you are 
>>>>>>>>>> seeing. If
>>>>>>>>>> you can’t do that, then just focus on this current example. And try 
>>>>>>>>>> to
>>>>>>>>>> refrain from calling people names unless your goal is just to have a 
>>>>>>>>>> fight,
>>>>>>>>>> as opposed to resolving the problematic behavior so you can continue 
>>>>>>>>>> to
>>>>>>>>>> contribute.
>>>>>>>>>>
>>>>>>>>>> I am not a committer and don’t have any special role in this
>>>>>>>>>> community. I am speaking just as an observer and regular contributor 
>>>>>>>>>> to the
>>>>>>>>>> project.
>>>>>>>>>>
>>>>>>>>>> > I have experienced this before, as recent as couple of months
>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>>>
>>>>>>>>>> For others following along, I took a look at this ticket and the
>>>>>>>>>> associated PRs: #53261
>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100
>>>>>>>>>> <https://github.com/apache/spark/pull/53100>
>>>>>>>>>>
>>>>>>>>>> It looks like Asif is upset that he submitted a fix for the same
>>>>>>>>>> issue a week or so prior to the fix that eventually got merged. But 
>>>>>>>>>> the
>>>>>>>>>> fixes are different, and the one that got merged is a lot shorter, 
>>>>>>>>>> though
>>>>>>>>>> they are both simple. The PR that got merged was submitted by 
>>>>>>>>>> someone who
>>>>>>>>>> appears to be employed by Databricks; perhaps this is part of the 
>>>>>>>>>> “cartel”
>>>>>>>>>> accusation. The two PRs were reviewed by different committers, 
>>>>>>>>>> however, and
>>>>>>>>>> the one that got merged was merged in by someone who does _not_ work 
>>>>>>>>>> for
>>>>>>>>>> Databricks.
>>>>>>>>>>
>>>>>>>>>> I don’t see anything here other than the normal dynamic of a
>>>>>>>>>> large and busy open source project. Committer attention is limited; 
>>>>>>>>>> things
>>>>>>>>>> fall through the cracks; different contributors may occasionally 
>>>>>>>>>> work on
>>>>>>>>>> the same thing without knowing about each other. A minor help to this
>>>>>>>>>> specific problem would be to have some way of automatically linking 
>>>>>>>>>> issues
>>>>>>>>>> that appear to be about the same thing.
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>> Pls see inline for comments/ replies
>>>>>>>>>>
>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Asif,
>>>>>>>>>>>
>>>>>>>>>>> Are you referring to
>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs.
>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are
>>>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't 
>>>>>>>>>>> take any
>>>>>>>>>>> code from your PR without consulting with you first.
>>>>>>>>>>>
>>>>>>>>>>  Yes Indeed Peter, I am referring to those.
>>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a
>>>>>>>>>> one liner, test has uncanny resemblance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> As far as I remember, I opened SPARK-56694 /
>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into
>>>>>>>>>>> that minor bug during the implementation of
>>>>>>>>>>> https://github.com/apache/spark/pull/55298.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The below I am addressing to the whole cartel.:
>>>>>>>>>> I have experienced this before, as recent as couple of months
>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>>> I have experienced,  my personal effort ( going into weeks) to
>>>>>>>>>> debug, reproduce issue reliably , being hijacked by members, without 
>>>>>>>>>> even
>>>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If interested, 
>>>>>>>>>> I can
>>>>>>>>>> provide details of the PRs / issues I am talking about)
>>>>>>>>>> I have seen a perfectly valid PR being nixed , by following
>>>>>>>>>> comment which essentially said
>>>>>>>>>> "  my code of making the cache lookup more effective , would
>>>>>>>>>> result in greater chances of stale cache being picked,  which 
>>>>>>>>>> already spark
>>>>>>>>>> suffers from."
>>>>>>>>>> Now the PR was related to collapsing the projects in analysis
>>>>>>>>>> phase, and side effect was cache pick up being more sensitive.
>>>>>>>>>> So this is such a frivolous reason to nix the PR , because
>>>>>>>>>> "staleness" is an underlying existing issue which had nothing to do 
>>>>>>>>>> with my
>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong 
>>>>>>>>>> result in
>>>>>>>>>> millions, that makes all the results a suspect in any case. It does 
>>>>>>>>>> not
>>>>>>>>>> matter at what frequency this occurs. To me the real reason was code
>>>>>>>>>> complexity ( & more likely  the loss of control of the code to the
>>>>>>>>>> outsider).
>>>>>>>>>>
>>>>>>>>>> The reason I call this open source community as cartel, is
>>>>>>>>>> because, I have seen the way it works pretty closely and have 
>>>>>>>>>> experienced
>>>>>>>>>> it in the email exchanges which happen on this group.
>>>>>>>>>> For the same PR , same issue,  if advertently or inadvertently ,
>>>>>>>>>> other person ( especially a member) gets his changes pushed, by the 
>>>>>>>>>> virtue
>>>>>>>>>> of his standing/position and the "for profit" company the person 
>>>>>>>>>> works, how
>>>>>>>>>> would you give the credit to the original person who discovered the 
>>>>>>>>>> issue
>>>>>>>>>> first / provided the fix?
>>>>>>>>>> Why are issues filed by some immediately worked upon by members (
>>>>>>>>>> some of whom claim to be working full time on spark) ? Is it because
>>>>>>>>>> certain companies / groups ( for profit companies, mind you )  exert 
>>>>>>>>>> undue
>>>>>>>>>> control, or the petty newbee has to be in the good books of members 
>>>>>>>>>> ( with
>>>>>>>>>> the hope that at some point they will also reach that position of 
>>>>>>>>>> power ?)
>>>>>>>>>>
>>>>>>>>>> Given the AI advent and such occurrences,  how will you give due
>>>>>>>>>> credit to the original creators and how do you plan to prevent some 
>>>>>>>>>> member
>>>>>>>>>> for taking up idea of any old open PR ( which for reasons of 
>>>>>>>>>> complexity and
>>>>>>>>>> non technical reasons) ,  polishing it up and pushing it as their 
>>>>>>>>>> own?
>>>>>>>>>>
>>>>>>>>>> I am also curious , am I the only one who is troubled by all
>>>>>>>>>> this, or there are others who have experienced it?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Asif
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> If you have further improvements please feel free to open a PR.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I had filed a bug
>>>>>>>>>>>>  https://issues.apache.org/jira/browse/SPARK-45866
>>>>>>>>>>>>
>>>>>>>>>>>> I had also opened a PR for the same.
>>>>>>>>>>>>
>>>>>>>>>>>> Now I see that the ticket I  filed is still open, but the issue
>>>>>>>>>>>> has been fixed using a new ticket
>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694
>>>>>>>>>>>>
>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which in
>>>>>>>>>>>> any case would be same) has been taken from my PR for
>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f
>>>>>>>>>>>>
>>>>>>>>>>>> To me this is clear unethical conduct of cartel member, unless
>>>>>>>>>>>> I am missing some valid reason.
>>>>>>>>>>>>
>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I just
>>>>>>>>>>>> found and filed a new ticket
>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126
>>>>>>>>>>>>
>>>>>>>>>>>> I know that atleast some cartel members are insecure and think
>>>>>>>>>>>> of OSS as their fiefdom, but this sort of behaviour , I never 
>>>>>>>>>>>> expected.
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Asif
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>

Reply via email to