I think the insistence that the folks are acting in bad faith makes it’s hard for people to make progress and I’m stoked that folks are discussing it. I know that I’ve run into this myself even as a committer where I’ll find an issue make a JIRA and a fix and then someone else will merge a fix or propose a different solution before I merge mine without searching for what already exists.
I know I’ve done the same to other folks in reverse. I think this serves as a good reminder of using search at least for open PRs when we’re taking on projects. One of the things I’ve tried to do, albeit imperfectly, when something is time critical and I can’t do the full back and forth review cycle is making a co-authored PR which includes others contributions cherry picked in to give everyone credit for their work (of course this doesn’t always work since sometimes different folks go in different directions on the same problem). As far as the more engagement on welcoming new committers, it’s easy to celebrate our successes (and we should! the world is harsh and joy is precious), but reading a thread calling a group that one may be a part of in a negative light is less fun. I am also hopefully that the periodic community syncs, while not taking any decisions per-se, will help foster more positive community interaction and make it easier for new folks to get their code reviewed. Similarily we’ve (Huaxin and Felix and myself in person with several other committers helping virtually/async) organized new contributor to Spark sprints where we try and make sure we’ve got dedicated review bandwidth for the new folks. The last one was in Seattle and I think we might try and do a Bay Area one and then a virtual one. Reviewer burnout is a real challenge in OSS, and while many of the PMC may work on or with Spark in their day jobs, the review time is often not something their employers prioritize or reward (at least in my experience it’s the exception rather than the norm). Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Sat, May 30, 2026 at 6:52 AM Mich Talebzadeh <[email protected]> wrote: > My view is that this discussion has reached "*the point of diminishing > returns*". At this stage, there is very little technical content left, > and the focus has shifted to questions of attribution and individual > grievances, *whether justified or no*t. I understand Asif's concerns, but > Spark is ultimately about improving the software, not awarding prizes for > who first identified an issue or proposed a solution. Most people here are > engineers focused on solving technical problems and moving the project > forward. > > The positions have been stated clearly. Perhaps it is time to accept that > not everyone will agree and move on, returning the focus to Spark itself. > > HTH > > Mich Talebzadeh, > > Data Scientist | Distributed Systems (Spark) | Financial & Metadata > Forensics | Transaction Reconstruction | Audit Analytics | Critical Data > Element (CDE) Traceability > > > > PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial College > London <https://en.wikipedia.org/wiki/Imperial_College_London> > > > > > On Sat, 30 May 2026 at 10:08, Andrew Melo <[email protected]> wrote: > >> Hello Wenchen, >> >> I think that there is some level of irony that Asif's thread about having >> PRs be ignored/co-opted is interleaved with the announcement of two new >> committers being added to the project. It's a bit telling that there is >> more engagement on each of those individual congratulations thread than >> this thread. >> >> I do have one strong critique of your mail below, and I think that >> Varquar nailed the implications: >> >> On Sat, May 30, 2026 at 13:18 Wenchen Fan <[email protected]> wrote: >> >>> I can understand the frustration of PRs being ignored while authors have >>> put a lot of effort on it. In an ideal world, all PRs deserve a decent >>> review/discussion from the community and at least one committer, so that >>> the PR can get merged or rejected with a clear reason. However, the reality >>> is: committers have limited time and attention, actively monitoring all >>> open PRs and JIRAs is nearly impossible. I just checked and there are 360+ >>> open PRs right now. >>> >>> I don't have a good solution here. I've been trying my best to review >>> PRs that ping me, but there is no guaranteed review (sometimes I was just >>> too tired, or misclicked the "mark as read" button). I don't think there is >>> a "control" here, it's all about how you can get attention. Building your >>> credibility with small tasks (which gets attention easier) is one way. >>> >> >> I share Asif's experience -- I tried to put in a small change which was >> ignored and then later closed because nobody had time to review it. Perhaps >> you find it unfair, but I think that Varquar is correct -- if new >> users/contributers cant even get someone to look at trivial patches, then >> how can.... >> >> >> @huaxin gao <[email protected]> is proposing a Monthly Spark >>> Community Sync, which may be a good place to present complex PRs/proposals >>> and get attention. >>> >> >> ... you even get to the place that you're looking at complex PRs. >> >> There was some people up-thread who critiqued the usage of "cabal" which >> I think is easy to chalk up to a language difference. >> >> As a native English speaker, I wouldn't use that word because "cabal" has >> a perjorative meaning (maybe "in-group" fits better), but the outcome is >> the same. If you're in the in-group then you can make code happen in hours, >> otherwise you have to either hope someone reviews your code or you pester >> them to look at it. >> >> I spent a lot of time both writing spark plugins and otherwise >> championing the use of spark in high energy physics, got disillusioned, and >> moved on to other frameworks because of how the project is structured and >> incentivized. There is a lot of promise in Spark from a technical >> standpoint and I hope that it can continue to grow. >> >> Sincerely >> Andrew >> >> >>> On Sat, May 30, 2026 at 12:29 AM Asif Shahid <[email protected]> >>> wrote: >>> >>>> This control , exclusivity, the requirement to build credibility by >>>> starting small ( like fixing formatting , stats ,log) , to leave complex >>>> issues to other " bright " minds, the informal hierarchy, may be this is >>>> how the open source works. >>>> Whatever it's , it does not sound "community " to me. >>>> This is a club ( cartel is offensive to some or most), with usual >>>> struggle for power , control and politics. >>>> >>>> On Fri, May 29, 2026, 8:31 AM Asif Shahid <[email protected]> >>>> wrote: >>>> >>>>> The last line is to be read as >>>>> >>>>> >>>>> That is how exclusivity and good control is to be maintained. >>>>> >>>>> On Fri, May 29, 2026, 8:29 AM Asif Shahid <[email protected]> >>>>> wrote: >>>>> >>>>>> To the open source C, >>>>>> As it's apparent to me and I believe tacitly admitted by the group in >>>>>> general and heard explicitly in person >>>>>> Any relatively complex PR which involves deeper thinking ( be it >>>>>> functional or performance issue) should be the business of member. >>>>>> If it's performance issue , no way . >>>>>> If it's functional issue which is becoming embarrassment to ignore, >>>>>> somehow ensure that the push happens under a member's PR. >>>>>> >>>>>> That is how exclusivity and good is to be maintained. >>>>>> >>>>>> >>>>>> On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Based on the data I have and discussed, it's my view that the PRs >>>>>>> opened by you were reactive, happening only after I had opened the >>>>>>> initial >>>>>>> ticket and PRs. >>>>>>> You are talking about simplifying the issue >>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537, >>>>>>> I am willing to discuss it here ,over meeting with other members of >>>>>>> your open source group, as to how it simplifies? >>>>>>> >>>>>>> In fact , I had repeatedly said that why are we discussing in >>>>>>> internal channel of company for the PR which I had filed in public Open >>>>>>> source . In that discussion ( the last one, before I was made redundant >>>>>>> by >>>>>>> company), I had given detailed explanation of why making each plan node >>>>>>> emit indeterministic is bad idea. ( I would ask you to make that last >>>>>>> slack public, but I am sure that would be an issue as your company >>>>>>> policy >>>>>>> might prohibit). >>>>>>> >>>>>>> I understood much earlier why you and your colleague never wanted >>>>>>> technical discussions on my public PRs on PR itself.. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The same holds for other alternate PRs including the issue of >>>>>>> "self joins". >>>>>>> I am willing to discuss it out with your group members, the problem >>>>>>> it solves and what your alternative PR does not. >>>>>>> >>>>>>> >>>>>>> I am not sure if this is generic approach of the "members", to >>>>>>> ensure that final checkin happens under their authorship. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Viquar, >>>>>>>> >>>>>>>> To resolve the immediate discrepancy, I ask that we formally link >>>>>>>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed >>>>>>>>> by," and >>>>>>>>> add a JIRA comment explicitly crediting Asif as the original >>>>>>>>> co-discoverer >>>>>>>>> of both the regression and the baseline fix. This standard attribution >>>>>>>>> costs us nothing but preserves the integrity of our commit history. >>>>>>>> >>>>>>>> >>>>>>>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I >>>>>>>> agree it's a fair point to link the tickets and mention Asif's previous >>>>>>>> work. Let me add a comment to both the ticket and the PR. >>>>>>>> >>>>>>>> Conversely, SPARK-56694 bypassed the queue and was merged within >>>>>>>>> eight hours >>>>>>>> >>>>>>>> >>>>>>>> I don't know, is there a queue? As for my work process, when I have >>>>>>>> some time for upstream reviews, I don't follow any queue. I just pick >>>>>>>> PRs >>>>>>>> that I find interesting or that relate to my experience with Spark. And >>>>>>>> despite its size, >>>>>>>> https://github.com/apache/spark/pull/55644/changes is technically >>>>>>>> just a one-liner, fairly trivial fix so review within 8 hours isn't >>>>>>>> extraordinary. >>>>>>>> >>>>>>>> Hi Asif, >>>>>>>> >>>>>>>> you opened an alternate PR, which... >>>>>>>>> >>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>>> opened... >>>>>>>> >>>>>>>> >>>>>>>> I think the reason for my simplification approach was discussed >>>>>>>> both offline and online in this thread: >>>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Best, >>>>>>>> Peter >>>>>>>> >>>>>>>> On Thu, May 28, 2026 at 10:29 PM vaquar khan <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have thoroughly reviewed the technical artifacts surrounding the >>>>>>>>> recent Catalyst optimizer canonicalization discussions to help guide >>>>>>>>> this >>>>>>>>> toward a constructive resolution. >>>>>>>>> >>>>>>>>> We must address a tangible breakdown in our review pipeline. >>>>>>>>> SPARK-45866 and its corresponding PR #49154 correctly identified this >>>>>>>>> complex Catalyst regression in late 2023, yet the ticket remained >>>>>>>>> unaddressed. *Conversely, SPARK-56694 bypassed the queue and was >>>>>>>>> merged within eight hours without referencing the prior art*. >>>>>>>>> Peter has transparently acknowledged the oversight in searching for >>>>>>>>> existing tickets, but we still need to close the loop. >>>>>>>>> >>>>>>>>> To resolve the immediate discrepancy,* I ask that we formally >>>>>>>>> link SPARK-45866 / PR #49154 within SPARK-56694 as "previously >>>>>>>>> proposed >>>>>>>>> by," and add a JIRA comment explicitly crediting Asif as the original >>>>>>>>> co-discoverer of both the regression and the baseline fix. This >>>>>>>>> standard >>>>>>>>> attribution costs us nothing but preserves the integrity of our commit >>>>>>>>> history. * >>>>>>>>> >>>>>>>>> Stepping back, this incident highlights a critical systemic risk >>>>>>>>> to our contributor ecosystem. The stark asymmetry in review velocity >>>>>>>>> where >>>>>>>>> an external contributor's highly complex PR sits stagnant for >>>>>>>>> months/years, >>>>>>>>> while an identical internal PR is merged in hours creates visible >>>>>>>>> friction. >>>>>>>>> Even if entirely unintentional due to organizational overload, this >>>>>>>>> pattern >>>>>>>>> discourages the high-level engineering talent required to sustain the >>>>>>>>> project's momentum. >>>>>>>>> >>>>>>>>> To maintain Spark’s technical leadership, we must actively >>>>>>>>> cultivate a culture where contributions are prioritized strictly by >>>>>>>>> their >>>>>>>>> architectural merit, regardless of authorship. Furthermore, we must >>>>>>>>> normalize the habit of proactively acknowledging independent work when >>>>>>>>> parallel discoveries surface. Small, intentional shifts in our >>>>>>>>> governance >>>>>>>>> and review cadence will yield massive dividends in community trust and >>>>>>>>> long-term innovation. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Viquar Khan >>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also I must admit that I did not know oss works by opening >>>>>>>>>> alternate PRs. >>>>>>>>>> In the places where I have worked most of my life, we work on the >>>>>>>>>> opened PR with the original author and try to bridge the gap. >>>>>>>>>> >>>>>>>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> In fact, I showed it not just to you but other colleague of >>>>>>>>>>> yours too. But there has been absolutely no comment or anything on >>>>>>>>>>> that >>>>>>>>>>> from then , till now. >>>>>>>>>>> >>>>>>>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> also take a look at this jira >>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-47320 >>>>>>>>>>>> for this also an alternate PR was opened. >>>>>>>>>>>> This problem is do deep in code, that I even showed you that in >>>>>>>>>>>> the existing test itself, if the join condition's operand are >>>>>>>>>>>> swapped, test >>>>>>>>>>>> fails.. Its completely broken , the self joins. >>>>>>>>>>>> I had proposed a consistent fix, which solves the issue >>>>>>>>>>>> completely and logically, but again an alternate PR was filed.. >>>>>>>>>>>> What issue was there in my PR , which I created...? >>>>>>>>>>>> Regards >>>>>>>>>>>> Asif >>>>>>>>>>>> >>>>>>>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as its >>>>>>>>>>>>>>> a one liner, test has uncanny resemblance >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Asif, what exactly is the "uncanny resemblance" between those >>>>>>>>>>>>>> test cases in >>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs >>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes ? Besides >>>>>>>>>>>>>> the fact that obviously they are comparing canonicalized forms. >>>>>>>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my >>>>>>>>>>>>>> fix has anything to do with yours. >>>>>>>>>>>>>> >>>>>>>>>>>>> Ok. I respect your opinion. Each one is entitled to its own >>>>>>>>>>>>> view >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Look at bug >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent >>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a >>>>>>>>>>>>>>> different PR , >>>>>>>>>>>>>>> taken a different route. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you see anything in common between >>>>>>>>>>>>>> https://github.com/apache/spark/pull/50029/changes and >>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757/changes ? >>>>>>>>>>>>>> Because I do see. That someone else had a much better idea: >>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082 >>>>>>>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was >>>>>>>>>>>>>> implemented for the benefit of Spark. >>>>>>>>>>>>>> IMO, that's the normal way of dealing with issues in an >>>>>>>>>>>>>> open-source project. Ideas come and go and hopefully the one >>>>>>>>>>>>>> best wins. >>>>>>>>>>>>>> >>>>>>>>>>>>> The checksum approach has its expense. That can come later , >>>>>>>>>>>>> because apriori its possible to detect whether the expression is >>>>>>>>>>>>> returning >>>>>>>>>>>>> value from an indeterministic expression. >>>>>>>>>>>>> You opened an alternate PR, which I have described in the PR >>>>>>>>>>>>> discussion that to fix the round robin issue which you were >>>>>>>>>>>>> dealing with, >>>>>>>>>>>>> you are trying to impose an order in in-deterministic expression >>>>>>>>>>>>> evaluattion, which itself is against the basic premise that if >>>>>>>>>>>>> data is >>>>>>>>>>>>> in-determinate, there cannot be order in it. >>>>>>>>>>>>> What issue did u see in the logic, that an alternate PR was >>>>>>>>>>>>> opened...which impacted all the stages ( including the >>>>>>>>>>>>> ancestors?) and I >>>>>>>>>>>>> already discussed internally why the idea you had in mind would >>>>>>>>>>>>> not work. I >>>>>>>>>>>>> specifically asked, why dont we discuss via the PR filed... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Nicholas, >>>>>>>>>>>>>>> You wanted some examples , right: >>>>>>>>>>>>>>> 1) Look at bug >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016 >>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent >>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a >>>>>>>>>>>>>>> different PR , >>>>>>>>>>>>>>> taken a different route. >>>>>>>>>>>>>>> Did any one who created new PR and route, showed up any >>>>>>>>>>>>>>> unaddressable logical issue? >>>>>>>>>>>>>>> The same goes for all the PRs ( which in case I have closed) >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think repeatedly calling the contributors on this list a >>>>>>>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You may have some history built up that led you to use that >>>>>>>>>>>>>>>> word, but to the rest of us it comes out of nowhere; you in >>>>>>>>>>>>>>>> fact opened >>>>>>>>>>>>>>>> this thread with that attack. If you keep making your case in >>>>>>>>>>>>>>>> this manner, >>>>>>>>>>>>>>>> you will just turn everyone against you. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If there is a history of what you feel is others stealing >>>>>>>>>>>>>>>> your work, please link to a few examples so we can see what >>>>>>>>>>>>>>>> you are seeing. >>>>>>>>>>>>>>>> If you can’t do that, then just focus on this current example. >>>>>>>>>>>>>>>> And try to >>>>>>>>>>>>>>>> refrain from calling people names unless your goal is just to >>>>>>>>>>>>>>>> have a fight, >>>>>>>>>>>>>>>> as opposed to resolving the problematic behavior so you can >>>>>>>>>>>>>>>> continue to >>>>>>>>>>>>>>>> contribute. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am not a committer and don’t have any special role in >>>>>>>>>>>>>>>> this community. I am speaking just as an observer and regular >>>>>>>>>>>>>>>> contributor >>>>>>>>>>>>>>>> to the project. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > I have experienced this before, as recent as couple of >>>>>>>>>>>>>>>> months back ( >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For others following along, I took a look at this ticket >>>>>>>>>>>>>>>> and the associated PRs: #53261 >>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100 >>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53100> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It looks like Asif is upset that he submitted a fix for the >>>>>>>>>>>>>>>> same issue a week or so prior to the fix that eventually got >>>>>>>>>>>>>>>> merged. But >>>>>>>>>>>>>>>> the fixes are different, and the one that got merged is a lot >>>>>>>>>>>>>>>> shorter, >>>>>>>>>>>>>>>> though they are both simple. The PR that got merged was >>>>>>>>>>>>>>>> submitted by >>>>>>>>>>>>>>>> someone who appears to be employed by Databricks; perhaps this >>>>>>>>>>>>>>>> is part of >>>>>>>>>>>>>>>> the “cartel” accusation. The two PRs were reviewed by >>>>>>>>>>>>>>>> different committers, >>>>>>>>>>>>>>>> however, and the one that got merged was merged in by someone >>>>>>>>>>>>>>>> who does >>>>>>>>>>>>>>>> _not_ work for Databricks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don’t see anything here other than the normal dynamic of >>>>>>>>>>>>>>>> a large and busy open source project. Committer attention is >>>>>>>>>>>>>>>> limited; >>>>>>>>>>>>>>>> things fall through the cracks; different contributors may >>>>>>>>>>>>>>>> occasionally >>>>>>>>>>>>>>>> work on the same thing without knowing about each other. A >>>>>>>>>>>>>>>> minor help to >>>>>>>>>>>>>>>> this specific problem would be to have some way of >>>>>>>>>>>>>>>> automatically linking >>>>>>>>>>>>>>>> issues that appear to be about the same thing. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Nick >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>> Pls see inline for comments/ replies >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey Asif, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Are you referring to >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs. >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes? Those >>>>>>>>>>>>>>>>> are definitely solving the same issue but I can assure you I >>>>>>>>>>>>>>>>> wouldn't take >>>>>>>>>>>>>>>>> any code from your PR without consulting with you first. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes Indeed Peter, I am referring to those. >>>>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as >>>>>>>>>>>>>>>> its a one liner, test has uncanny resemblance. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As far as I remember, I opened SPARK-56694 / >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran >>>>>>>>>>>>>>>>> into that minor bug during the implementation of >>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55298. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already >>>>>>>>>>>>>>>>> existed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The below I am addressing to the whole cartel.: >>>>>>>>>>>>>>>> I have experienced this before, as recent as couple of >>>>>>>>>>>>>>>> months back ( >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386) >>>>>>>>>>>>>>>> I have experienced, my personal effort ( going into weeks) >>>>>>>>>>>>>>>> to debug, reproduce issue reliably , being hijacked by >>>>>>>>>>>>>>>> members, without >>>>>>>>>>>>>>>> even discussing the fix proposed, ( by opening new PRs). ( If >>>>>>>>>>>>>>>> interested, I >>>>>>>>>>>>>>>> can provide details of the PRs / issues I am talking about) >>>>>>>>>>>>>>>> I have seen a perfectly valid PR being nixed , by following >>>>>>>>>>>>>>>> comment which essentially said >>>>>>>>>>>>>>>> " my code of making the cache lookup more effective , >>>>>>>>>>>>>>>> would result in greater chances of stale cache being picked, >>>>>>>>>>>>>>>> which already >>>>>>>>>>>>>>>> spark suffers from." >>>>>>>>>>>>>>>> Now the PR was related to collapsing the projects in >>>>>>>>>>>>>>>> analysis phase, and side effect was cache pick up being more >>>>>>>>>>>>>>>> sensitive. >>>>>>>>>>>>>>>> So this is such a frivolous reason to nix the PR , because >>>>>>>>>>>>>>>> "staleness" is an underlying existing issue which had nothing >>>>>>>>>>>>>>>> to do with my >>>>>>>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one >>>>>>>>>>>>>>>> wrong result in >>>>>>>>>>>>>>>> millions, that makes all the results a suspect in any case. It >>>>>>>>>>>>>>>> does not >>>>>>>>>>>>>>>> matter at what frequency this occurs. To me the real reason >>>>>>>>>>>>>>>> was code >>>>>>>>>>>>>>>> complexity ( & more likely the loss of control of the code to >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> outsider). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The reason I call this open source community as cartel, is >>>>>>>>>>>>>>>> because, I have seen the way it works pretty closely and have >>>>>>>>>>>>>>>> experienced >>>>>>>>>>>>>>>> it in the email exchanges which happen on this group. >>>>>>>>>>>>>>>> For the same PR , same issue, if advertently or >>>>>>>>>>>>>>>> inadvertently , other person ( especially a member) gets his >>>>>>>>>>>>>>>> changes >>>>>>>>>>>>>>>> pushed, by the virtue of his standing/position and the "for >>>>>>>>>>>>>>>> profit" company >>>>>>>>>>>>>>>> the person works, how would you give the credit to the >>>>>>>>>>>>>>>> original person who >>>>>>>>>>>>>>>> discovered the issue first / provided the fix? >>>>>>>>>>>>>>>> Why are issues filed by some immediately worked upon by >>>>>>>>>>>>>>>> members ( some of whom claim to be working full time on spark) >>>>>>>>>>>>>>>> ? Is it >>>>>>>>>>>>>>>> because certain companies / groups ( for profit companies, >>>>>>>>>>>>>>>> mind you ) >>>>>>>>>>>>>>>> exert undue control, or the petty newbee has to be in the good >>>>>>>>>>>>>>>> books of >>>>>>>>>>>>>>>> members ( with the hope that at some point they will also >>>>>>>>>>>>>>>> reach that >>>>>>>>>>>>>>>> position of power ?) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Given the AI advent and such occurrences, how will you >>>>>>>>>>>>>>>> give due credit to the original creators and how do you plan >>>>>>>>>>>>>>>> to prevent >>>>>>>>>>>>>>>> some member for taking up idea of any old open PR ( which for >>>>>>>>>>>>>>>> reasons of >>>>>>>>>>>>>>>> complexity and non technical reasons) , polishing it up and >>>>>>>>>>>>>>>> pushing it as >>>>>>>>>>>>>>>> their own? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am also curious , am I the only one who is troubled by >>>>>>>>>>>>>>>> all this, or there are others who have experienced it? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If you have further improvements please feel free to open >>>>>>>>>>>>>>>>> a PR. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> I had filed a bug >>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I had also opened a PR for the same. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Now I see that the ticket I filed is still open, but the >>>>>>>>>>>>>>>>>> issue has been fixed using a new ticket >>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix ( >>>>>>>>>>>>>>>>>> which in any case would be same) has been taken from my PR >>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To me this is clear unethical conduct of cartel member, >>>>>>>>>>>>>>>>>> unless I am missing some valid reason. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I >>>>>>>>>>>>>>>>>> just found and filed a new ticket >>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I know that atleast some cartel members are insecure and >>>>>>>>>>>>>>>>>> think of OSS as their fiefdom, but this sort of behaviour , >>>>>>>>>>>>>>>>>> I never >>>>>>>>>>>>>>>>>> expected. >>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>> Asif >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>
