This control , exclusivity, the requirement to build credibility by starting small ( like fixing formatting , stats ,log) , to leave complex issues to other " bright " minds, the informal hierarchy, may be this is how the open source works. Whatever it's , it does not sound "community " to me. This is a club ( cartel is offensive to some or most), with usual struggle for power , control and politics.
On Fri, May 29, 2026, 8:31 AM Asif Shahid <[email protected]> wrote: > The last line is to be read as > > > That is how exclusivity and good control is to be maintained. > > On Fri, May 29, 2026, 8:29 AM Asif Shahid <[email protected]> wrote: > >> To the open source C, >> As it's apparent to me and I believe tacitly admitted by the group in >> general and heard explicitly in person >> Any relatively complex PR which involves deeper thinking ( be it >> functional or performance issue) should be the business of member. >> If it's performance issue , no way . >> If it's functional issue which is becoming embarrassment to ignore, >> somehow ensure that the push happens under a member's PR. >> >> That is how exclusivity and good is to be maintained. >> >> >> On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> wrote: >> >>> Based on the data I have and discussed, it's my view that the PRs opened >>> by you were reactive, happening only after I had opened the initial ticket >>> and PRs. >>> You are talking about simplifying the issue >>> https://github.com/apache/spark/pull/50757#discussion_r2069390537, >>> I am willing to discuss it here ,over meeting with other members of >>> your open source group, as to how it simplifies? >>> >>> In fact , I had repeatedly said that why are we discussing in internal >>> channel of company for the PR which I had filed in public Open source . In >>> that discussion ( the last one, before I was made redundant by company), I >>> had given detailed explanation of why making each plan node emit >>> indeterministic is bad idea. ( I would ask you to make that last slack >>> public, but I am sure that would be an issue as your company policy might >>> prohibit). >>> >>> I understood much earlier why you and your colleague never wanted >>> technical discussions on my public PRs on PR itself.. >>> >>> >>> >>> The same holds for other alternate PRs including the issue of "self >>> joins". >>> I am willing to discuss it out with your group members, the problem it >>> solves and what your alternative PR does not. >>> >>> >>> I am not sure if this is generic approach of the "members", to ensure >>> that final checkin happens under their authorship. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> wrote: >>> >>>> Hi Viquar, >>>> >>>> To resolve the immediate discrepancy, I ask that we formally link >>>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," >>>>> and >>>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>>>> of both the regression and the baseline fix. This standard attribution >>>>> costs us nothing but preserves the integrity of our commit history. >>>> >>>> >>>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I >>>> agree it's a fair point to link the tickets and mention Asif's previous >>>> work. Let me add a comment to both the ticket and the PR. >>>> >>>> Conversely, SPARK-56694 bypassed the queue and was merged within eight >>>>> hours >>>> >>>> >>>> I don't know, is there a queue? As for my work process, when I have >>>> some time for upstream reviews, I don't follow any queue. I just pick PRs >>>> that I find interesting or that relate to my experience with Spark. And >>>> despite its size, https://github.com/apache/spark/pull/55644/changes >>>> is technically just a one-liner, fairly trivial fix so review within 8 >>>> hours isn't extraordinary. >>>> >>>> Hi Asif, >>>> >>>> you opened an alternate PR, which... >>>>> >>>> What issue did u see in the logic, that an alternate PR was opened... >>>> >>>> >>>> I think the reason for my simplification approach was discussed both >>>> offline and online in this thread: >>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537 >>>> >>> >>> >>> >>> >>>> Best, >>>> Peter >>>> >>>> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have thoroughly reviewed the technical artifacts surrounding the >>>>> recent Catalyst optimizer canonicalization discussions to help guide this >>>>> toward a constructive resolution. >>>>> >>>>> We must address a tangible breakdown in our review pipeline. >>>>> SPARK-45866 and its corresponding PR #49154 correctly identified this >>>>> complex Catalyst regression in late 2023, yet the ticket remained >>>>> unaddressed. *Conversely, SPARK-56694 bypassed the queue and was >>>>> merged within eight hours without referencing the prior art*. Peter >>>>> has transparently acknowledged the oversight in searching for existing >>>>> tickets, but we still need to close the loop. >>>>> >>>>> To resolve the immediate discrepancy,* I ask that we formally link >>>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," >>>>> and >>>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>>>> of both the regression and the baseline fix. This standard attribution >>>>> costs us nothing but preserves the integrity of our commit history. * >>>>> >>>>> Stepping back, this incident highlights a critical systemic risk to >>>>> our contributor ecosystem. The stark asymmetry in review velocity where an >>>>> external contributor's highly complex PR sits stagnant for months/years, >>>>> while an identical internal PR is merged in hours creates visible >>>>> friction. >>>>> Even if entirely unintentional due to organizational overload, this >>>>> pattern >>>>> discourages the high-level engineering talent required to sustain the >>>>> project's momentum. >>>>> >>>>> To maintain Spark’s technical leadership, we must actively cultivate a >>>>> culture where contributions are prioritized strictly by their >>>>> architectural >>>>> merit, regardless of authorship. Furthermore, we must normalize the habit >>>>> of proactively acknowledging independent work when parallel discoveries >>>>> surface. Small, intentional shifts in our governance and review cadence >>>>> will yield massive dividends in community trust and long-term innovation. >>>>> >>>>> Best regards, >>>>> Viquar Khan >>>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true >>>>> >>>>> >>>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> >>>>> wrote: >>>>> >>>>>> Also I must admit that I did not know oss works by opening alternate >>>>>> PRs. >>>>>> In the places where I have worked most of my life, we work on the >>>>>> opened PR with the original author and try to bridge the gap. >>>>>> >>>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> In fact, I showed it not just to you but other colleague of yours >>>>>>> too. But there has been absolutely no comment or anything on that from >>>>>>> then , till now. >>>>>>> >>>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> also take a look at this jira >>>>>>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>>>>>> for this also an alternate PR was opened. >>>>>>>> This problem is do deep in code, that I even showed you that in the >>>>>>>> existing test itself, if the join condition's operand are swapped, test >>>>>>>> fails.. Its completely broken , the self joins. >>>>>>>> I had proposed a consistent fix, which solves the issue completely >>>>>>>> and logically, but again an alternate PR was filed.. >>>>>>>> What issue was there in my PR , which I created...? >>>>>>>> Regards >>>>>>>> Asif >>>>>>>> >>>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a >>>>>>>>>>> one liner, test has uncanny resemblance >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Asif, what exactly is the "uncanny resemblance" between those >>>>>>>>>> test cases in https://github.com/apache/spark/pull/49154/changes >>>>>>>>>> vs https://github.com/apache/spark/pull/55644/changes ? Besides >>>>>>>>>> the fact that obviously they are comparing canonicalized forms. >>>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix >>>>>>>>>> has anything to do with yours. >>>>>>>>>> >>>>>>>>> Ok. I respect your opinion. Each one is entitled to its own view >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly >>>>>>>>>>> 2- 3 weeks. I filed a PR. The bug was fixed via a different PR , >>>>>>>>>>> taken a >>>>>>>>>>> different route. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do you see anything in common between >>>>>>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>>>>>> Because I do see. That someone else had a much better idea: >>>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 >>>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was >>>>>>>>>> implemented for the benefit of Spark. >>>>>>>>>> IMO, that's the normal way of dealing with issues in an >>>>>>>>>> open-source project. Ideas come and go and hopefully the one best >>>>>>>>>> wins. >>>>>>>>>> >>>>>>>>> The checksum approach has its expense. That can come later , >>>>>>>>> because apriori its possible to detect whether the expression is >>>>>>>>> returning >>>>>>>>> value from an indeterministic expression. >>>>>>>>> You opened an alternate PR, which I have described in the PR >>>>>>>>> discussion that to fix the round robin issue which you were dealing >>>>>>>>> with, >>>>>>>>> you are trying to impose an order in in-deterministic expression >>>>>>>>> evaluattion, which itself is against the basic premise that if data is >>>>>>>>> in-determinate, there cannot be order in it. >>>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>>> opened...which impacted all the stages ( including the ancestors?) >>>>>>>>> and I >>>>>>>>> already discussed internally why the idea you had in mind would not >>>>>>>>> work. I >>>>>>>>> specifically asked, why dont we discuss via the PR filed... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Nicholas, >>>>>>>>>>> You wanted some examples , right: >>>>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly >>>>>>>>>>> 2- 3 weeks. I filed a PR. The bug was fixed via a different PR , >>>>>>>>>>> taken a >>>>>>>>>>> different route. >>>>>>>>>>> Did any one who created new PR and route, showed up any >>>>>>>>>>> unaddressable logical issue? >>>>>>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>>>>>> Regards >>>>>>>>>>> Asif >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I think repeatedly calling the contributors on this list a >>>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution. >>>>>>>>>>>> >>>>>>>>>>>> You may have some history built up that led you to use that >>>>>>>>>>>> word, but to the rest of us it comes out of nowhere; you in fact >>>>>>>>>>>> opened >>>>>>>>>>>> this thread with that attack. If you keep making your case in this >>>>>>>>>>>> manner, >>>>>>>>>>>> you will just turn everyone against you. >>>>>>>>>>>> >>>>>>>>>>>> If there is a history of what you feel is others stealing your >>>>>>>>>>>> work, please link to a few examples so we can see what you are >>>>>>>>>>>> seeing. If >>>>>>>>>>>> you can’t do that, then just focus on this current example. And >>>>>>>>>>>> try to >>>>>>>>>>>> refrain from calling people names unless your goal is just to have >>>>>>>>>>>> a fight, >>>>>>>>>>>> as opposed to resolving the problematic behavior so you can >>>>>>>>>>>> continue to >>>>>>>>>>>> contribute. >>>>>>>>>>>> >>>>>>>>>>>> I am not a committer and don’t have any special role in this >>>>>>>>>>>> community. I am speaking just as an observer and regular >>>>>>>>>>>> contributor to the >>>>>>>>>>>> project. >>>>>>>>>>>> >>>>>>>>>>>> > I have experienced this before, as recent as couple of months >>>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>> >>>>>>>>>>>> For others following along, I took a look at this ticket and >>>>>>>>>>>> the associated PRs: #53261 >>>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100 >>>>>>>>>>>> <https://github.com/apache/spark/pull/53100> >>>>>>>>>>>> >>>>>>>>>>>> It looks like Asif is upset that he submitted a fix for the >>>>>>>>>>>> same issue a week or so prior to the fix that eventually got >>>>>>>>>>>> merged. But >>>>>>>>>>>> the fixes are different, and the one that got merged is a lot >>>>>>>>>>>> shorter, >>>>>>>>>>>> though they are both simple. The PR that got merged was submitted >>>>>>>>>>>> by >>>>>>>>>>>> someone who appears to be employed by Databricks; perhaps this is >>>>>>>>>>>> part of >>>>>>>>>>>> the “cartel” accusation. The two PRs were reviewed by different >>>>>>>>>>>> committers, >>>>>>>>>>>> however, and the one that got merged was merged in by someone who >>>>>>>>>>>> does >>>>>>>>>>>> _not_ work for Databricks. >>>>>>>>>>>> >>>>>>>>>>>> I don’t see anything here other than the normal dynamic of a >>>>>>>>>>>> large and busy open source project. Committer attention is >>>>>>>>>>>> limited; things >>>>>>>>>>>> fall through the cracks; different contributors may occasionally >>>>>>>>>>>> work on >>>>>>>>>>>> the same thing without knowing about each other. A minor help to >>>>>>>>>>>> this >>>>>>>>>>>> specific problem would be to have some way of automatically >>>>>>>>>>>> linking issues >>>>>>>>>>>> that appear to be about the same thing. >>>>>>>>>>>> >>>>>>>>>>>> Nick >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Peter, >>>>>>>>>>>> Pls see inline for comments/ replies >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey Asif, >>>>>>>>>>>>> >>>>>>>>>>>>> Are you referring to >>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are >>>>>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't >>>>>>>>>>>>> take any >>>>>>>>>>>>> code from your PR without consulting with you first. >>>>>>>>>>>>> >>>>>>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a >>>>>>>>>>>> one liner, test has uncanny resemblance. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into >>>>>>>>>>>>> that minor bug during the implementation of >>>>>>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>>>>>> I have experienced this before, as recent as couple of months >>>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>> I have experienced, my personal effort ( going into weeks) to >>>>>>>>>>>> debug, reproduce issue reliably , being hijacked by members, >>>>>>>>>>>> without even >>>>>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If >>>>>>>>>>>> interested, I can >>>>>>>>>>>> provide details of the PRs / issues I am talking about) >>>>>>>>>>>> I have seen a perfectly valid PR being nixed , by following >>>>>>>>>>>> comment which essentially said >>>>>>>>>>>> " my code of making the cache lookup more effective , would >>>>>>>>>>>> result in greater chances of stale cache being picked, which >>>>>>>>>>>> already spark >>>>>>>>>>>> suffers from." >>>>>>>>>>>> Now the PR was related to collapsing the projects in analysis >>>>>>>>>>>> phase, and side effect was cache pick up being more sensitive. >>>>>>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>>>>>> "staleness" is an underlying existing issue which had nothing to >>>>>>>>>>>> do with my >>>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong >>>>>>>>>>>> result in >>>>>>>>>>>> millions, that makes all the results a suspect in any case. It >>>>>>>>>>>> does not >>>>>>>>>>>> matter at what frequency this occurs. To me the real reason was >>>>>>>>>>>> code >>>>>>>>>>>> complexity ( & more likely the loss of control of the code to the >>>>>>>>>>>> outsider). >>>>>>>>>>>> >>>>>>>>>>>> The reason I call this open source community as cartel, is >>>>>>>>>>>> because, I have seen the way it works pretty closely and have >>>>>>>>>>>> experienced >>>>>>>>>>>> it in the email exchanges which happen on this group. >>>>>>>>>>>> For the same PR , same issue, if advertently or inadvertently >>>>>>>>>>>> , other person ( especially a member) gets his changes pushed, by >>>>>>>>>>>> the >>>>>>>>>>>> virtue of his standing/position and the "for profit" company the >>>>>>>>>>>> person >>>>>>>>>>>> works, how would you give the credit to the original person who >>>>>>>>>>>> discovered >>>>>>>>>>>> the issue first / provided the fix? >>>>>>>>>>>> Why are issues filed by some immediately worked upon by members >>>>>>>>>>>> ( some of whom claim to be working full time on spark) ? Is it >>>>>>>>>>>> because >>>>>>>>>>>> certain companies / groups ( for profit companies, mind you ) >>>>>>>>>>>> exert undue >>>>>>>>>>>> control, or the petty newbee has to be in the good books of >>>>>>>>>>>> members ( with >>>>>>>>>>>> the hope that at some point they will also reach that position of >>>>>>>>>>>> power ?) >>>>>>>>>>>> >>>>>>>>>>>> Given the AI advent and such occurrences, how will you give >>>>>>>>>>>> due credit to the original creators and how do you plan to prevent >>>>>>>>>>>> some >>>>>>>>>>>> member for taking up idea of any old open PR ( which for reasons of >>>>>>>>>>>> complexity and non technical reasons) , polishing it up and >>>>>>>>>>>> pushing it as >>>>>>>>>>>> their own? >>>>>>>>>>>> >>>>>>>>>>>> I am also curious , am I the only one who is troubled by all >>>>>>>>>>>> this, or there are others who have experienced it? >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Asif >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> If you have further improvements please feel free to open a PR. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> I had filed a bug >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now I see that the ticket I filed is still open, but the >>>>>>>>>>>>>> issue has been fixed using a new ticket >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>>>>>> >>>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which >>>>>>>>>>>>>> in any case would be same) has been taken from my PR for >>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>>>>>> >>>>>>>>>>>>>> To me this is clear unethical conduct of cartel member, >>>>>>>>>>>>>> unless I am missing some valid reason. >>>>>>>>>>>>>> >>>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I just >>>>>>>>>>>>>> found and filed a new ticket >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I know that atleast some cartel members are insecure and >>>>>>>>>>>>>> think of OSS as their fiefdom, but this sort of behaviour , I >>>>>>>>>>>>>> never >>>>>>>>>>>>>> expected. >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> Asif >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>
