The last line is to be read as
That is how exclusivity and good control is to be maintained. On Fri, May 29, 2026, 8:29 AM Asif Shahid <[email protected]> wrote: > To the open source C, > As it's apparent to me and I believe tacitly admitted by the group in > general and heard explicitly in person > Any relatively complex PR which involves deeper thinking ( be it > functional or performance issue) should be the business of member. > If it's performance issue , no way . > If it's functional issue which is becoming embarrassment to ignore, > somehow ensure that the push happens under a member's PR. > > That is how exclusivity and good is to be maintained. > > > On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> wrote: > >> Based on the data I have and discussed, it's my view that the PRs opened >> by you were reactive, happening only after I had opened the initial ticket >> and PRs. >> You are talking about simplifying the issue >> https://github.com/apache/spark/pull/50757#discussion_r2069390537, >> I am willing to discuss it here ,over meeting with other members of your >> open source group, as to how it simplifies? >> >> In fact , I had repeatedly said that why are we discussing in internal >> channel of company for the PR which I had filed in public Open source . In >> that discussion ( the last one, before I was made redundant by company), I >> had given detailed explanation of why making each plan node emit >> indeterministic is bad idea. ( I would ask you to make that last slack >> public, but I am sure that would be an issue as your company policy might >> prohibit). >> >> I understood much earlier why you and your colleague never wanted >> technical discussions on my public PRs on PR itself.. >> >> >> >> The same holds for other alternate PRs including the issue of "self >> joins". >> I am willing to discuss it out with your group members, the problem it >> solves and what your alternative PR does not. >> >> >> I am not sure if this is generic approach of the "members", to ensure >> that final checkin happens under their authorship. >> >> >> >> >> >> >> >> >> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> wrote: >> >>> Hi Viquar, >>> >>> To resolve the immediate discrepancy, I ask that we formally link >>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and >>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>>> of both the regression and the baseline fix. This standard attribution >>>> costs us nothing but preserves the integrity of our commit history. >>> >>> >>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I agree >>> it's a fair point to link the tickets and mention Asif's previous work. Let >>> me add a comment to both the ticket and the PR. >>> >>> Conversely, SPARK-56694 bypassed the queue and was merged within eight >>>> hours >>> >>> >>> I don't know, is there a queue? As for my work process, when I have some >>> time for upstream reviews, I don't follow any queue. I just pick PRs that I >>> find interesting or that relate to my experience with Spark. And despite >>> its size, https://github.com/apache/spark/pull/55644/changes is >>> technically just a one-liner, fairly trivial fix so review within 8 hours >>> isn't extraordinary. >>> >>> Hi Asif, >>> >>> you opened an alternate PR, which... >>>> >>> What issue did u see in the logic, that an alternate PR was opened... >>> >>> >>> I think the reason for my simplification approach was discussed both >>> offline and online in this thread: >>> https://github.com/apache/spark/pull/50757#discussion_r2069390537 >>> >> >> >> >> >>> Best, >>> Peter >>> >>> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> I have thoroughly reviewed the technical artifacts surrounding the >>>> recent Catalyst optimizer canonicalization discussions to help guide this >>>> toward a constructive resolution. >>>> >>>> We must address a tangible breakdown in our review pipeline. >>>> SPARK-45866 and its corresponding PR #49154 correctly identified this >>>> complex Catalyst regression in late 2023, yet the ticket remained >>>> unaddressed. *Conversely, SPARK-56694 bypassed the queue and was >>>> merged within eight hours without referencing the prior art*. Peter >>>> has transparently acknowledged the oversight in searching for existing >>>> tickets, but we still need to close the loop. >>>> >>>> To resolve the immediate discrepancy,* I ask that we formally link >>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed by," and >>>> add a JIRA comment explicitly crediting Asif as the original co-discoverer >>>> of both the regression and the baseline fix. This standard attribution >>>> costs us nothing but preserves the integrity of our commit history. * >>>> >>>> Stepping back, this incident highlights a critical systemic risk to our >>>> contributor ecosystem. The stark asymmetry in review velocity where an >>>> external contributor's highly complex PR sits stagnant for months/years, >>>> while an identical internal PR is merged in hours creates visible friction. >>>> Even if entirely unintentional due to organizational overload, this pattern >>>> discourages the high-level engineering talent required to sustain the >>>> project's momentum. >>>> >>>> To maintain Spark’s technical leadership, we must actively cultivate a >>>> culture where contributions are prioritized strictly by their architectural >>>> merit, regardless of authorship. Furthermore, we must normalize the habit >>>> of proactively acknowledging independent work when parallel discoveries >>>> surface. Small, intentional shifts in our governance and review cadence >>>> will yield massive dividends in community trust and long-term innovation. >>>> >>>> Best regards, >>>> Viquar Khan >>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true >>>> >>>> >>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> >>>> wrote: >>>> >>>>> Also I must admit that I did not know oss works by opening alternate >>>>> PRs. >>>>> In the places where I have worked most of my life, we work on the >>>>> opened PR with the original author and try to bridge the gap. >>>>> >>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> >>>>> wrote: >>>>> >>>>>> In fact, I showed it not just to you but other colleague of yours >>>>>> too. But there has been absolutely no comment or anything on that from >>>>>> then , till now. >>>>>> >>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> also take a look at this jira >>>>>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>>>>> for this also an alternate PR was opened. >>>>>>> This problem is do deep in code, that I even showed you that in the >>>>>>> existing test itself, if the join condition's operand are swapped, test >>>>>>> fails.. Its completely broken , the self joins. >>>>>>> I had proposed a consistent fix, which solves the issue completely >>>>>>> and logically, but again an alternate PR was filed.. >>>>>>> What issue was there in my PR , which I created...? >>>>>>> Regards >>>>>>> Asif >>>>>>> >>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> As for the fix, itself, is not indicative of any thing as its a >>>>>>>>>> one liner, test has uncanny resemblance >>>>>>>>> >>>>>>>>> >>>>>>>>> Asif, what exactly is the "uncanny resemblance" between those test >>>>>>>>> cases in https://github.com/apache/spark/pull/49154/changes vs >>>>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides the >>>>>>>>> fact that obviously they are comparing canonicalized forms. >>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my fix has >>>>>>>>> anything to do with yours. >>>>>>>>> >>>>>>>> Ok. I respect your opinion. Each one is entitled to its own view >>>>>>>> >>>>>>>>> >>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- >>>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>>>>> different route. >>>>>>>>> >>>>>>>>> >>>>>>>>> Do you see anything in common between >>>>>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>>>>> Because I do see. That someone else had a much better idea: >>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 >>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was >>>>>>>>> implemented for the benefit of Spark. >>>>>>>>> IMO, that's the normal way of dealing with issues in an >>>>>>>>> open-source project. Ideas come and go and hopefully the one best >>>>>>>>> wins. >>>>>>>>> >>>>>>>> The checksum approach has its expense. That can come later , >>>>>>>> because apriori its possible to detect whether the expression is >>>>>>>> returning >>>>>>>> value from an indeterministic expression. >>>>>>>> You opened an alternate PR, which I have described in the PR >>>>>>>> discussion that to fix the round robin issue which you were dealing >>>>>>>> with, >>>>>>>> you are trying to impose an order in in-deterministic expression >>>>>>>> evaluattion, which itself is against the basic premise that if data is >>>>>>>> in-determinate, there cannot be order in it. >>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>> opened...which impacted all the stages ( including the ancestors?) and >>>>>>>> I >>>>>>>> already discussed internally why the idea you had in mind would not >>>>>>>> work. I >>>>>>>> specifically asked, why dont we discuss via the PR filed... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Nicholas, >>>>>>>>>> You wanted some examples , right: >>>>>>>>>> 1) Look at bug https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>> To discover this bug and reproduce it reliably, I spent nearly 2- >>>>>>>>>> 3 weeks. I filed a PR. The bug was fixed via a different PR , taken a >>>>>>>>>> different route. >>>>>>>>>> Did any one who created new PR and route, showed up any >>>>>>>>>> unaddressable logical issue? >>>>>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>>>>> Regards >>>>>>>>>> Asif >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I think repeatedly calling the contributors on this list a >>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution. >>>>>>>>>>> >>>>>>>>>>> You may have some history built up that led you to use that >>>>>>>>>>> word, but to the rest of us it comes out of nowhere; you in fact >>>>>>>>>>> opened >>>>>>>>>>> this thread with that attack. If you keep making your case in this >>>>>>>>>>> manner, >>>>>>>>>>> you will just turn everyone against you. >>>>>>>>>>> >>>>>>>>>>> If there is a history of what you feel is others stealing your >>>>>>>>>>> work, please link to a few examples so we can see what you are >>>>>>>>>>> seeing. If >>>>>>>>>>> you can’t do that, then just focus on this current example. And try >>>>>>>>>>> to >>>>>>>>>>> refrain from calling people names unless your goal is just to have >>>>>>>>>>> a fight, >>>>>>>>>>> as opposed to resolving the problematic behavior so you can >>>>>>>>>>> continue to >>>>>>>>>>> contribute. >>>>>>>>>>> >>>>>>>>>>> I am not a committer and don’t have any special role in this >>>>>>>>>>> community. I am speaking just as an observer and regular >>>>>>>>>>> contributor to the >>>>>>>>>>> project. >>>>>>>>>>> >>>>>>>>>>> > I have experienced this before, as recent as couple of months >>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>> >>>>>>>>>>> For others following along, I took a look at this ticket and the >>>>>>>>>>> associated PRs: #53261 >>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100 >>>>>>>>>>> <https://github.com/apache/spark/pull/53100> >>>>>>>>>>> >>>>>>>>>>> It looks like Asif is upset that he submitted a fix for the same >>>>>>>>>>> issue a week or so prior to the fix that eventually got merged. But >>>>>>>>>>> the >>>>>>>>>>> fixes are different, and the one that got merged is a lot shorter, >>>>>>>>>>> though >>>>>>>>>>> they are both simple. The PR that got merged was submitted by >>>>>>>>>>> someone who >>>>>>>>>>> appears to be employed by Databricks; perhaps this is part of the >>>>>>>>>>> “cartel” >>>>>>>>>>> accusation. The two PRs were reviewed by different committers, >>>>>>>>>>> however, and >>>>>>>>>>> the one that got merged was merged in by someone who does _not_ >>>>>>>>>>> work for >>>>>>>>>>> Databricks. >>>>>>>>>>> >>>>>>>>>>> I don’t see anything here other than the normal dynamic of a >>>>>>>>>>> large and busy open source project. Committer attention is limited; >>>>>>>>>>> things >>>>>>>>>>> fall through the cracks; different contributors may occasionally >>>>>>>>>>> work on >>>>>>>>>>> the same thing without knowing about each other. A minor help to >>>>>>>>>>> this >>>>>>>>>>> specific problem would be to have some way of automatically linking >>>>>>>>>>> issues >>>>>>>>>>> that appear to be about the same thing. >>>>>>>>>>> >>>>>>>>>>> Nick >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Peter, >>>>>>>>>>> Pls see inline for comments/ replies >>>>>>>>>>> >>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Asif, >>>>>>>>>>>> >>>>>>>>>>>> Are you referring to >>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those are >>>>>>>>>>>> definitely solving the same issue but I can assure you I wouldn't >>>>>>>>>>>> take any >>>>>>>>>>>> code from your PR without consulting with you first. >>>>>>>>>>>> >>>>>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>>>>> As for the fix, itself, is not indicative of any thing as its a >>>>>>>>>>> one liner, test has uncanny resemblance. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran into >>>>>>>>>>>> that minor bug during the implementation of >>>>>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already existed. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>>>>> I have experienced this before, as recent as couple of months >>>>>>>>>>> back ( https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>> I have experienced, my personal effort ( going into weeks) to >>>>>>>>>>> debug, reproduce issue reliably , being hijacked by members, >>>>>>>>>>> without even >>>>>>>>>>> discussing the fix proposed, ( by opening new PRs). ( If >>>>>>>>>>> interested, I can >>>>>>>>>>> provide details of the PRs / issues I am talking about) >>>>>>>>>>> I have seen a perfectly valid PR being nixed , by following >>>>>>>>>>> comment which essentially said >>>>>>>>>>> " my code of making the cache lookup more effective , would >>>>>>>>>>> result in greater chances of stale cache being picked, which >>>>>>>>>>> already spark >>>>>>>>>>> suffers from." >>>>>>>>>>> Now the PR was related to collapsing the projects in analysis >>>>>>>>>>> phase, and side effect was cache pick up being more sensitive. >>>>>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>>>>> "staleness" is an underlying existing issue which had nothing to do >>>>>>>>>>> with my >>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one wrong >>>>>>>>>>> result in >>>>>>>>>>> millions, that makes all the results a suspect in any case. It does >>>>>>>>>>> not >>>>>>>>>>> matter at what frequency this occurs. To me the real reason was code >>>>>>>>>>> complexity ( & more likely the loss of control of the code to the >>>>>>>>>>> outsider). >>>>>>>>>>> >>>>>>>>>>> The reason I call this open source community as cartel, is >>>>>>>>>>> because, I have seen the way it works pretty closely and have >>>>>>>>>>> experienced >>>>>>>>>>> it in the email exchanges which happen on this group. >>>>>>>>>>> For the same PR , same issue, if advertently or inadvertently , >>>>>>>>>>> other person ( especially a member) gets his changes pushed, by the >>>>>>>>>>> virtue >>>>>>>>>>> of his standing/position and the "for profit" company the person >>>>>>>>>>> works, how >>>>>>>>>>> would you give the credit to the original person who discovered the >>>>>>>>>>> issue >>>>>>>>>>> first / provided the fix? >>>>>>>>>>> Why are issues filed by some immediately worked upon by members >>>>>>>>>>> ( some of whom claim to be working full time on spark) ? Is it >>>>>>>>>>> because >>>>>>>>>>> certain companies / groups ( for profit companies, mind you ) >>>>>>>>>>> exert undue >>>>>>>>>>> control, or the petty newbee has to be in the good books of members >>>>>>>>>>> ( with >>>>>>>>>>> the hope that at some point they will also reach that position of >>>>>>>>>>> power ?) >>>>>>>>>>> >>>>>>>>>>> Given the AI advent and such occurrences, how will you give due >>>>>>>>>>> credit to the original creators and how do you plan to prevent some >>>>>>>>>>> member >>>>>>>>>>> for taking up idea of any old open PR ( which for reasons of >>>>>>>>>>> complexity and >>>>>>>>>>> non technical reasons) , polishing it up and pushing it as their >>>>>>>>>>> own? >>>>>>>>>>> >>>>>>>>>>> I am also curious , am I the only one who is troubled by all >>>>>>>>>>> this, or there are others who have experienced it? >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Asif >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> If you have further improvements please feel free to open a PR. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> I had filed a bug >>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>>>>> >>>>>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>>>>> >>>>>>>>>>>>> Now I see that the ticket I filed is still open, but the >>>>>>>>>>>>> issue has been fixed using a new ticket >>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>>>>> >>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( which >>>>>>>>>>>>> in any case would be same) has been taken from my PR for >>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>>>>> >>>>>>>>>>>>> To me this is clear unethical conduct of cartel member, unless >>>>>>>>>>>>> I am missing some valid reason. >>>>>>>>>>>>> >>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I just >>>>>>>>>>>>> found and filed a new ticket >>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>>>>> >>>>>>>>>>>>> I know that atleast some cartel members are insecure and think >>>>>>>>>>>>> of OSS as their fiefdom, but this sort of behaviour , I never >>>>>>>>>>>>> expected. >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Asif >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>
