Thank you all . I agree with Mich that it's reached it's shelf life . Points heard, points shared..I have no doubt that whatever all of us have said has been in full honesty and what we all genuinely believe in. >From my side , last email on this in group. Thank you Peter, Wenchen, Holden, Mich, Nicholas, Sean , Vaquar , Tian for putting forth your views.
On Sat, May 30, 2026, 7:00 AM Holden Karau <[email protected]> wrote: > I think the insistence that the folks are acting in bad faith makes it’s > hard for people to make progress and I’m stoked that folks are discussing > it. I know that I’ve run into this myself even as a committer where I’ll > find an issue make a JIRA and a fix and then someone else will merge a fix > or propose a different solution before I merge mine without searching for > what already exists. > > I know I’ve done the same to other folks in reverse. > > I think this serves as a good reminder of using search at least for open > PRs when we’re taking on projects. > > One of the things I’ve tried to do, albeit imperfectly, when something is > time critical and I can’t do the full back and forth review cycle is making > a co-authored PR which includes others contributions cherry picked in to > give everyone credit for their work (of course this doesn’t always work > since sometimes different folks go in different directions on the same > problem). > > As far as the more engagement on welcoming new committers, it’s easy to > celebrate our successes (and we should! the world is harsh and joy is > precious), but reading a thread calling a group that one may be a part of > in a negative light is less fun. > > I am also hopefully that the periodic community syncs, while not taking > any decisions per-se, will help foster more positive community interaction > and make it easier for new folks to get their code reviewed. > > Similarily we’ve (Huaxin and Felix and myself in person with several other > committers helping virtually/async) organized new contributor to Spark > sprints where we try and make sure we’ve got dedicated review bandwidth for > the new folks. The last one was in Seattle and I think we might try and do > a Bay Area one and then a virtual one. > > Reviewer burnout is a real challenge in OSS, and while many of the PMC may > work on or with Spark in their day jobs, the review time is often not > something their employers prioritize or reward (at least in my experience > it’s the exception rather than the norm). > > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > On Sat, May 30, 2026 at 6:52 AM Mich Talebzadeh <[email protected]> > wrote: > >> My view is that this discussion has reached "*the point of diminishing >> returns*". At this stage, there is very little technical content left, >> and the focus has shifted to questions of attribution and individual >> grievances, *whether justified or no*t. I understand Asif's concerns, >> but Spark is ultimately about improving the software, not awarding prizes >> for who first identified an issue or proposed a solution. Most people here >> are engineers focused on solving technical problems and moving the project >> forward. >> >> The positions have been stated clearly. Perhaps it is time to accept that >> not everyone will agree and move on, returning the focus to Spark itself. >> >> HTH >> >> Mich Talebzadeh, >> >> Data Scientist | Distributed Systems (Spark) | Financial & Metadata >> Forensics | Transaction Reconstruction | Audit Analytics | Critical Data >> Element (CDE) Traceability >> >> >> >> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial >> College London <https://en.wikipedia.org/wiki/Imperial_College_London> >> >> >> >> >> On Sat, 30 May 2026 at 10:08, Andrew Melo <[email protected]> wrote: >> >>> Hello Wenchen, >>> >>> I think that there is some level of irony that Asif's thread about >>> having PRs be ignored/co-opted is interleaved with the announcement of two >>> new committers being added to the project. It's a bit telling that there is >>> more engagement on each of those individual congratulations thread than >>> this thread. >>> >>> I do have one strong critique of your mail below, and I think that >>> Varquar nailed the implications: >>> >>> On Sat, May 30, 2026 at 13:18 Wenchen Fan <[email protected]> wrote: >>> >>>> I can understand the frustration of PRs being ignored while authors >>>> have put a lot of effort on it. In an ideal world, all PRs deserve a decent >>>> review/discussion from the community and at least one committer, so that >>>> the PR can get merged or rejected with a clear reason. However, the reality >>>> is: committers have limited time and attention, actively monitoring all >>>> open PRs and JIRAs is nearly impossible. I just checked and there are 360+ >>>> open PRs right now. >>>> >>>> I don't have a good solution here. I've been trying my best to review >>>> PRs that ping me, but there is no guaranteed review (sometimes I was just >>>> too tired, or misclicked the "mark as read" button). I don't think there is >>>> a "control" here, it's all about how you can get attention. Building your >>>> credibility with small tasks (which gets attention easier) is one way. >>>> >>> >>> I share Asif's experience -- I tried to put in a small change which was >>> ignored and then later closed because nobody had time to review it. Perhaps >>> you find it unfair, but I think that Varquar is correct -- if new >>> users/contributers cant even get someone to look at trivial patches, then >>> how can.... >>> >>> >>> @huaxin gao <[email protected]> is proposing a Monthly Spark >>>> Community Sync, which may be a good place to present complex PRs/proposals >>>> and get attention. >>>> >>> >>> ... you even get to the place that you're looking at complex PRs. >>> >>> There was some people up-thread who critiqued the usage of "cabal" which >>> I think is easy to chalk up to a language difference. >>> >>> As a native English speaker, I wouldn't use that word because "cabal" >>> has a perjorative meaning (maybe "in-group" fits better), but the outcome >>> is the same. If you're in the in-group then you can make code happen in >>> hours, otherwise you have to either hope someone reviews your code or you >>> pester them to look at it. >>> >>> I spent a lot of time both writing spark plugins and otherwise >>> championing the use of spark in high energy physics, got disillusioned, and >>> moved on to other frameworks because of how the project is structured and >>> incentivized. There is a lot of promise in Spark from a technical >>> standpoint and I hope that it can continue to grow. >>> >>> Sincerely >>> Andrew >>> >>> >>>> On Sat, May 30, 2026 at 12:29 AM Asif Shahid <[email protected]> >>>> wrote: >>>> >>>>> This control , exclusivity, the requirement to build credibility by >>>>> starting small ( like fixing formatting , stats ,log) , to leave complex >>>>> issues to other " bright " minds, the informal hierarchy, may be this is >>>>> how the open source works. >>>>> Whatever it's , it does not sound "community " to me. >>>>> This is a club ( cartel is offensive to some or most), with usual >>>>> struggle for power , control and politics. >>>>> >>>>> On Fri, May 29, 2026, 8:31 AM Asif Shahid <[email protected]> >>>>> wrote: >>>>> >>>>>> The last line is to be read as >>>>>> >>>>>> >>>>>> That is how exclusivity and good control is to be maintained. >>>>>> >>>>>> On Fri, May 29, 2026, 8:29 AM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> To the open source C, >>>>>>> As it's apparent to me and I believe tacitly admitted by the group >>>>>>> in general and heard explicitly in person >>>>>>> Any relatively complex PR which involves deeper thinking ( be it >>>>>>> functional or performance issue) should be the business of member. >>>>>>> If it's performance issue , no way . >>>>>>> If it's functional issue which is becoming embarrassment to ignore, >>>>>>> somehow ensure that the push happens under a member's PR. >>>>>>> >>>>>>> That is how exclusivity and good is to be maintained. >>>>>>> >>>>>>> >>>>>>> On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Based on the data I have and discussed, it's my view that the PRs >>>>>>>> opened by you were reactive, happening only after I had opened the >>>>>>>> initial >>>>>>>> ticket and PRs. >>>>>>>> You are talking about simplifying the issue >>>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537, >>>>>>>> I am willing to discuss it here ,over meeting with other members >>>>>>>> of your open source group, as to how it simplifies? >>>>>>>> >>>>>>>> In fact , I had repeatedly said that why are we discussing in >>>>>>>> internal channel of company for the PR which I had filed in public Open >>>>>>>> source . In that discussion ( the last one, before I was made >>>>>>>> redundant by >>>>>>>> company), I had given detailed explanation of why making each plan >>>>>>>> node >>>>>>>> emit indeterministic is bad idea. ( I would ask you to make that last >>>>>>>> slack public, but I am sure that would be an issue as your company >>>>>>>> policy >>>>>>>> might prohibit). >>>>>>>> >>>>>>>> I understood much earlier why you and your colleague never wanted >>>>>>>> technical discussions on my public PRs on PR itself.. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The same holds for other alternate PRs including the issue of >>>>>>>> "self joins". >>>>>>>> I am willing to discuss it out with your group members, the problem >>>>>>>> it solves and what your alternative PR does not. >>>>>>>> >>>>>>>> >>>>>>>> I am not sure if this is generic approach of the "members", to >>>>>>>> ensure that final checkin happens under their authorship. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Viquar, >>>>>>>>> >>>>>>>>> To resolve the immediate discrepancy, I ask that we formally link >>>>>>>>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed >>>>>>>>>> by," and >>>>>>>>>> add a JIRA comment explicitly crediting Asif as the original >>>>>>>>>> co-discoverer >>>>>>>>>> of both the regression and the baseline fix. This standard >>>>>>>>>> attribution >>>>>>>>>> costs us nothing but preserves the integrity of our commit history. >>>>>>>>> >>>>>>>>> >>>>>>>>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I >>>>>>>>> agree it's a fair point to link the tickets and mention Asif's >>>>>>>>> previous >>>>>>>>> work. Let me add a comment to both the ticket and the PR. >>>>>>>>> >>>>>>>>> Conversely, SPARK-56694 bypassed the queue and was merged within >>>>>>>>>> eight hours >>>>>>>>> >>>>>>>>> >>>>>>>>> I don't know, is there a queue? As for my work process, when I >>>>>>>>> have some time for upstream reviews, I don't follow any queue. I just >>>>>>>>> pick >>>>>>>>> PRs that I find interesting or that relate to my experience with >>>>>>>>> Spark. And >>>>>>>>> despite its size, >>>>>>>>> https://github.com/apache/spark/pull/55644/changes is technically >>>>>>>>> just a one-liner, fairly trivial fix so review within 8 hours isn't >>>>>>>>> extraordinary. >>>>>>>>> >>>>>>>>> Hi Asif, >>>>>>>>> >>>>>>>>> you opened an alternate PR, which... >>>>>>>>>> >>>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>>>> opened... >>>>>>>>> >>>>>>>>> >>>>>>>>> I think the reason for my simplification approach was discussed >>>>>>>>> both offline and online in this thread: >>>>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Best, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> On Thu, May 28, 2026 at 10:29 PM vaquar khan < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I have thoroughly reviewed the technical artifacts surrounding >>>>>>>>>> the recent Catalyst optimizer canonicalization discussions to help >>>>>>>>>> guide >>>>>>>>>> this toward a constructive resolution. >>>>>>>>>> >>>>>>>>>> We must address a tangible breakdown in our review pipeline. >>>>>>>>>> SPARK-45866 and its corresponding PR #49154 correctly identified this >>>>>>>>>> complex Catalyst regression in late 2023, yet the ticket remained >>>>>>>>>> unaddressed. *Conversely, SPARK-56694 bypassed the queue and was >>>>>>>>>> merged within eight hours without referencing the prior art*. >>>>>>>>>> Peter has transparently acknowledged the oversight in searching for >>>>>>>>>> existing tickets, but we still need to close the loop. >>>>>>>>>> >>>>>>>>>> To resolve the immediate discrepancy,* I ask that we formally >>>>>>>>>> link SPARK-45866 / PR #49154 within SPARK-56694 as "previously >>>>>>>>>> proposed >>>>>>>>>> by," and add a JIRA comment explicitly crediting Asif as the original >>>>>>>>>> co-discoverer of both the regression and the baseline fix. This >>>>>>>>>> standard >>>>>>>>>> attribution costs us nothing but preserves the integrity of our >>>>>>>>>> commit >>>>>>>>>> history. * >>>>>>>>>> >>>>>>>>>> Stepping back, this incident highlights a critical systemic risk >>>>>>>>>> to our contributor ecosystem. The stark asymmetry in review velocity >>>>>>>>>> where >>>>>>>>>> an external contributor's highly complex PR sits stagnant for >>>>>>>>>> months/years, >>>>>>>>>> while an identical internal PR is merged in hours creates visible >>>>>>>>>> friction. >>>>>>>>>> Even if entirely unintentional due to organizational overload, this >>>>>>>>>> pattern >>>>>>>>>> discourages the high-level engineering talent required to sustain the >>>>>>>>>> project's momentum. >>>>>>>>>> >>>>>>>>>> To maintain Spark’s technical leadership, we must actively >>>>>>>>>> cultivate a culture where contributions are prioritized strictly by >>>>>>>>>> their >>>>>>>>>> architectural merit, regardless of authorship. Furthermore, we must >>>>>>>>>> normalize the habit of proactively acknowledging independent work >>>>>>>>>> when >>>>>>>>>> parallel discoveries surface. Small, intentional shifts in our >>>>>>>>>> governance >>>>>>>>>> and review cadence will yield massive dividends in community trust >>>>>>>>>> and >>>>>>>>>> long-term innovation. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Viquar Khan >>>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Also I must admit that I did not know oss works by opening >>>>>>>>>>> alternate PRs. >>>>>>>>>>> In the places where I have worked most of my life, we work on >>>>>>>>>>> the opened PR with the original author and try to bridge the gap. >>>>>>>>>>> >>>>>>>>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> In fact, I showed it not just to you but other colleague of >>>>>>>>>>>> yours too. But there has been absolutely no comment or anything on >>>>>>>>>>>> that >>>>>>>>>>>> from then , till now. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> also take a look at this jira >>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>>>>>>>>>>> for this also an alternate PR was opened. >>>>>>>>>>>>> This problem is do deep in code, that I even showed you that >>>>>>>>>>>>> in the existing test itself, if the join condition's operand are >>>>>>>>>>>>> swapped, >>>>>>>>>>>>> test fails.. Its completely broken , the self joins. >>>>>>>>>>>>> I had proposed a consistent fix, which solves the issue >>>>>>>>>>>>> completely and logically, but again an alternate PR was filed.. >>>>>>>>>>>>> What issue was there in my PR , which I created...? >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Asif >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as >>>>>>>>>>>>>>>> its a one liner, test has uncanny resemblance >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Asif, what exactly is the "uncanny resemblance" between >>>>>>>>>>>>>>> those test cases in >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes ? >>>>>>>>>>>>>>> Besides the fact that obviously they are comparing >>>>>>>>>>>>>>> canonicalized forms. >>>>>>>>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my >>>>>>>>>>>>>>> fix has anything to do with yours. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok. I respect your opinion. Each one is entitled to its own >>>>>>>>>>>>>> view >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Look at bug >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent >>>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a >>>>>>>>>>>>>>>> different PR , >>>>>>>>>>>>>>>> taken a different route. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you see anything in common between >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>>>>>>>>>>> Because I do see. That someone else had a much better idea: >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 >>>>>>>>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was >>>>>>>>>>>>>>> implemented for the benefit of Spark. >>>>>>>>>>>>>>> IMO, that's the normal way of dealing with issues in an >>>>>>>>>>>>>>> open-source project. Ideas come and go and hopefully the one >>>>>>>>>>>>>>> best wins. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> The checksum approach has its expense. That can come later , >>>>>>>>>>>>>> because apriori its possible to detect whether the expression is >>>>>>>>>>>>>> returning >>>>>>>>>>>>>> value from an indeterministic expression. >>>>>>>>>>>>>> You opened an alternate PR, which I have described in the PR >>>>>>>>>>>>>> discussion that to fix the round robin issue which you were >>>>>>>>>>>>>> dealing with, >>>>>>>>>>>>>> you are trying to impose an order in in-deterministic expression >>>>>>>>>>>>>> evaluattion, which itself is against the basic premise that if >>>>>>>>>>>>>> data is >>>>>>>>>>>>>> in-determinate, there cannot be order in it. >>>>>>>>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>>>>>>>> opened...which impacted all the stages ( including the >>>>>>>>>>>>>> ancestors?) and I >>>>>>>>>>>>>> already discussed internally why the idea you had in mind would >>>>>>>>>>>>>> not work. I >>>>>>>>>>>>>> specifically asked, why dont we discuss via the PR filed... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Nicholas, >>>>>>>>>>>>>>>> You wanted some examples , right: >>>>>>>>>>>>>>>> 1) Look at bug >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent >>>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a >>>>>>>>>>>>>>>> different PR , >>>>>>>>>>>>>>>> taken a different route. >>>>>>>>>>>>>>>> Did any one who created new PR and route, showed up any >>>>>>>>>>>>>>>> unaddressable logical issue? >>>>>>>>>>>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think repeatedly calling the contributors on this list a >>>>>>>>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You may have some history built up that led you to use >>>>>>>>>>>>>>>>> that word, but to the rest of us it comes out of nowhere; you >>>>>>>>>>>>>>>>> in fact >>>>>>>>>>>>>>>>> opened this thread with that attack. If you keep making your >>>>>>>>>>>>>>>>> case in this >>>>>>>>>>>>>>>>> manner, you will just turn everyone against you. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If there is a history of what you feel is others stealing >>>>>>>>>>>>>>>>> your work, please link to a few examples so we can see what >>>>>>>>>>>>>>>>> you are seeing. >>>>>>>>>>>>>>>>> If you can’t do that, then just focus on this current >>>>>>>>>>>>>>>>> example. And try to >>>>>>>>>>>>>>>>> refrain from calling people names unless your goal is just to >>>>>>>>>>>>>>>>> have a fight, >>>>>>>>>>>>>>>>> as opposed to resolving the problematic behavior so you can >>>>>>>>>>>>>>>>> continue to >>>>>>>>>>>>>>>>> contribute. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am not a committer and don’t have any special role in >>>>>>>>>>>>>>>>> this community. I am speaking just as an observer and regular >>>>>>>>>>>>>>>>> contributor >>>>>>>>>>>>>>>>> to the project. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> > I have experienced this before, as recent as couple of >>>>>>>>>>>>>>>>> months back ( >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For others following along, I took a look at this ticket >>>>>>>>>>>>>>>>> and the associated PRs: #53261 >>>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100 >>>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53100> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It looks like Asif is upset that he submitted a fix for >>>>>>>>>>>>>>>>> the same issue a week or so prior to the fix that eventually >>>>>>>>>>>>>>>>> got merged. >>>>>>>>>>>>>>>>> But the fixes are different, and the one that got merged is a >>>>>>>>>>>>>>>>> lot shorter, >>>>>>>>>>>>>>>>> though they are both simple. The PR that got merged was >>>>>>>>>>>>>>>>> submitted by >>>>>>>>>>>>>>>>> someone who appears to be employed by Databricks; perhaps >>>>>>>>>>>>>>>>> this is part of >>>>>>>>>>>>>>>>> the “cartel” accusation. The two PRs were reviewed by >>>>>>>>>>>>>>>>> different committers, >>>>>>>>>>>>>>>>> however, and the one that got merged was merged in by someone >>>>>>>>>>>>>>>>> who does >>>>>>>>>>>>>>>>> _not_ work for Databricks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I don’t see anything here other than the normal dynamic of >>>>>>>>>>>>>>>>> a large and busy open source project. Committer attention is >>>>>>>>>>>>>>>>> limited; >>>>>>>>>>>>>>>>> things fall through the cracks; different contributors may >>>>>>>>>>>>>>>>> occasionally >>>>>>>>>>>>>>>>> work on the same thing without knowing about each other. A >>>>>>>>>>>>>>>>> minor help to >>>>>>>>>>>>>>>>> this specific problem would be to have some way of >>>>>>>>>>>>>>>>> automatically linking >>>>>>>>>>>>>>>>> issues that appear to be about the same thing. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Nick >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>>> Pls see inline for comments/ replies >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey Asif, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Are you referring to >>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? >>>>>>>>>>>>>>>>>> Those are definitely solving the same issue but I can assure >>>>>>>>>>>>>>>>>> you I wouldn't >>>>>>>>>>>>>>>>>> take any code from your PR without consulting with you first. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as >>>>>>>>>>>>>>>>> its a one liner, test has uncanny resemblance. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran >>>>>>>>>>>>>>>>>> into that minor bug during the implementation of >>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already >>>>>>>>>>>>>>>>>> existed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>>>>>>>>>>> I have experienced this before, as recent as couple of >>>>>>>>>>>>>>>>> months back ( >>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>>>>>>> I have experienced, my personal effort ( going into >>>>>>>>>>>>>>>>> weeks) to debug, reproduce issue reliably , being hijacked by >>>>>>>>>>>>>>>>> members, >>>>>>>>>>>>>>>>> without even discussing the fix proposed, ( by opening new >>>>>>>>>>>>>>>>> PRs). ( If >>>>>>>>>>>>>>>>> interested, I can provide details of the PRs / issues I am >>>>>>>>>>>>>>>>> talking about) >>>>>>>>>>>>>>>>> I have seen a perfectly valid PR being nixed , by >>>>>>>>>>>>>>>>> following comment which essentially said >>>>>>>>>>>>>>>>> " my code of making the cache lookup more effective , >>>>>>>>>>>>>>>>> would result in greater chances of stale cache being picked, >>>>>>>>>>>>>>>>> which already >>>>>>>>>>>>>>>>> spark suffers from." >>>>>>>>>>>>>>>>> Now the PR was related to collapsing the projects in >>>>>>>>>>>>>>>>> analysis phase, and side effect was cache pick up being more >>>>>>>>>>>>>>>>> sensitive. >>>>>>>>>>>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>>>>>>>>>>> "staleness" is an underlying existing issue which had nothing >>>>>>>>>>>>>>>>> to do with my >>>>>>>>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one >>>>>>>>>>>>>>>>> wrong result in >>>>>>>>>>>>>>>>> millions, that makes all the results a suspect in any case. >>>>>>>>>>>>>>>>> It does not >>>>>>>>>>>>>>>>> matter at what frequency this occurs. To me the real reason >>>>>>>>>>>>>>>>> was code >>>>>>>>>>>>>>>>> complexity ( & more likely the loss of control of the code >>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>> outsider). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The reason I call this open source community as cartel, is >>>>>>>>>>>>>>>>> because, I have seen the way it works pretty closely and have >>>>>>>>>>>>>>>>> experienced >>>>>>>>>>>>>>>>> it in the email exchanges which happen on this group. >>>>>>>>>>>>>>>>> For the same PR , same issue, if advertently or >>>>>>>>>>>>>>>>> inadvertently , other person ( especially a member) gets his >>>>>>>>>>>>>>>>> changes >>>>>>>>>>>>>>>>> pushed, by the virtue of his standing/position and the "for >>>>>>>>>>>>>>>>> profit" company >>>>>>>>>>>>>>>>> the person works, how would you give the credit to the >>>>>>>>>>>>>>>>> original person who >>>>>>>>>>>>>>>>> discovered the issue first / provided the fix? >>>>>>>>>>>>>>>>> Why are issues filed by some immediately worked upon by >>>>>>>>>>>>>>>>> members ( some of whom claim to be working full time on >>>>>>>>>>>>>>>>> spark) ? Is it >>>>>>>>>>>>>>>>> because certain companies / groups ( for profit companies, >>>>>>>>>>>>>>>>> mind you ) >>>>>>>>>>>>>>>>> exert undue control, or the petty newbee has to be in the >>>>>>>>>>>>>>>>> good books of >>>>>>>>>>>>>>>>> members ( with the hope that at some point they will also >>>>>>>>>>>>>>>>> reach that >>>>>>>>>>>>>>>>> position of power ?) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Given the AI advent and such occurrences, how will you >>>>>>>>>>>>>>>>> give due credit to the original creators and how do you plan >>>>>>>>>>>>>>>>> to prevent >>>>>>>>>>>>>>>>> some member for taking up idea of any old open PR ( which for >>>>>>>>>>>>>>>>> reasons of >>>>>>>>>>>>>>>>> complexity and non technical reasons) , polishing it up and >>>>>>>>>>>>>>>>> pushing it as >>>>>>>>>>>>>>>>> their own? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I am also curious , am I the only one who is troubled by >>>>>>>>>>>>>>>>> all this, or there are others who have experienced it? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If you have further improvements please feel free to open >>>>>>>>>>>>>>>>>> a PR. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> I had filed a bug >>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Now I see that the ticket I filed is still open, but >>>>>>>>>>>>>>>>>>> the issue has been fixed using a new ticket >>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( >>>>>>>>>>>>>>>>>>> which in any case would be same) has been taken from my PR >>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To me this is clear unethical conduct of cartel member, >>>>>>>>>>>>>>>>>>> unless I am missing some valid reason. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I >>>>>>>>>>>>>>>>>>> just found and filed a new ticket >>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I know that atleast some cartel members are insecure and >>>>>>>>>>>>>>>>>>> think of OSS as their fiefdom, but this sort of behaviour , >>>>>>>>>>>>>>>>>>> I never >>>>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
