Thank you all . I agree with Mich that it's reached it's shelf life .
Points heard, points shared..I have no doubt that whatever all of us have
said has been in full honesty and what we all genuinely believe in.
>From my side , last email on this in group.
Thank you Peter, Wenchen, Holden, Mich, Nicholas, Sean , Vaquar , Tian for
putting forth your views.

On Sat, May 30, 2026, 7:00 AM Holden Karau <[email protected]> wrote:

> I think the insistence that the folks are acting in bad faith makes it’s
> hard for people to make progress and I’m stoked that folks are discussing
> it. I know that I’ve run into this myself even as a committer where I’ll
> find an issue make a JIRA and a fix and then someone else will merge a fix
> or propose a different solution before I merge mine without searching for
> what already exists.
>
> I know I’ve done the same to other folks in reverse.
>
> I think this serves as a good reminder of using search at least for open
> PRs when we’re taking on projects.
>
> One of the things I’ve tried to do, albeit imperfectly, when something is
> time critical and I can’t do the full back and forth review cycle is making
> a co-authored PR which includes others contributions cherry picked in to
> give everyone credit for their work (of course this doesn’t always work
> since sometimes different folks go in different directions on the same
> problem).
>
> As far as the more engagement on welcoming new committers, it’s easy to
> celebrate our successes (and we should! the world is harsh and joy is
> precious), but reading a thread calling a group that one may be a part of
> in a negative light is less fun.
>
> I am also hopefully that the periodic community syncs, while not taking
> any decisions per-se, will help foster more positive community interaction
> and make it easier for new folks to get their code reviewed.
>
> Similarily we’ve (Huaxin and Felix and myself in person with several other
> committers helping virtually/async) organized new contributor to Spark
> sprints where we try and make sure we’ve got dedicated review bandwidth for
> the new folks. The last one was in Seattle and I think we might try and do
> a Bay Area one and then a virtual one.
>
> Reviewer burnout is a real challenge in OSS, and while many of the PMC may
> work on or with Spark in their day jobs, the review time is often not
> something their employers prioritize or reward (at least in my experience
> it’s the exception rather than the norm).
>
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
> On Sat, May 30, 2026 at 6:52 AM Mich Talebzadeh <[email protected]>
> wrote:
>
>> My view is that this discussion has reached "*the point of diminishing
>> returns*". At this stage, there is very little technical content left,
>> and the focus has shifted to questions of attribution and individual
>> grievances, *whether justified or no*t. I understand Asif's concerns,
>> but Spark is ultimately about improving the software, not awarding prizes
>> for who first identified an issue or proposed a solution. Most people here
>> are engineers focused on solving technical problems and moving the project
>> forward.
>>
>> The positions have been stated clearly. Perhaps it is time to accept that
>> not everyone will agree and move on, returning the focus to Spark itself.
>>
>> HTH
>>
>>  Mich Talebzadeh,
>>
>> Data Scientist | Distributed Systems (Spark) | Financial & Metadata
>> Forensics | Transaction Reconstruction | Audit Analytics | Critical Data
>> Element (CDE) Traceability
>>
>>
>>
>> PhD <https://en.wikipedia.org/wiki/Doctor_of_Philosophy> Imperial
>> College London <https://en.wikipedia.org/wiki/Imperial_College_London>
>>
>>
>>
>>
>> On Sat, 30 May 2026 at 10:08, Andrew Melo <[email protected]> wrote:
>>
>>> Hello Wenchen,
>>>
>>> I think that there is some level of irony that Asif's thread about
>>> having PRs be ignored/co-opted is interleaved with the announcement of two
>>> new committers being added to the project. It's a bit telling that there is
>>> more engagement on each of those individual congratulations thread than
>>> this thread.
>>>
>>> I do have one strong critique of your mail below, and I think that
>>> Varquar nailed the implications:
>>>
>>> On Sat, May 30, 2026 at 13:18 Wenchen Fan <[email protected]> wrote:
>>>
>>>> I can understand the frustration of PRs being ignored while authors
>>>> have put a lot of effort on it. In an ideal world, all PRs deserve a decent
>>>> review/discussion from the community and at least one committer, so that
>>>> the PR can get merged or rejected with a clear reason. However, the reality
>>>> is: committers have limited time and attention, actively monitoring all
>>>> open PRs and JIRAs is nearly impossible. I just checked and there are 360+
>>>> open PRs right now.
>>>>
>>>> I don't have a good solution here. I've been trying my best to review
>>>> PRs that ping me, but there is no guaranteed review (sometimes I was just
>>>> too tired, or misclicked the "mark as read" button). I don't think there is
>>>> a "control" here, it's all about how you can get attention. Building your
>>>> credibility with small tasks (which gets attention easier) is one way.
>>>>
>>>
>>> I share Asif's experience -- I tried to put in a small change which was
>>> ignored and then later closed because nobody had time to review it. Perhaps
>>> you find it unfair, but I think that Varquar is correct -- if new
>>> users/contributers cant even get someone to look at trivial patches, then
>>> how can....
>>>
>>>
>>>  @huaxin gao <[email protected]> is proposing a Monthly Spark
>>>> Community Sync, which may be a good place to present complex PRs/proposals
>>>> and get attention.
>>>>
>>>
>>> ... you even get to the place that you're looking at complex PRs.
>>>
>>> There was some people up-thread who critiqued the usage of "cabal" which
>>> I think is easy to chalk up to a language difference.
>>>
>>> As a native English speaker, I wouldn't use that word because "cabal"
>>> has a perjorative meaning (maybe "in-group" fits better), but the outcome
>>> is the same. If you're in the in-group then you can make code happen in
>>> hours, otherwise you have to either hope someone reviews your code or you
>>> pester them to look at it.
>>>
>>> I spent a lot of time both writing spark plugins and otherwise
>>> championing the use of spark in high energy physics, got disillusioned, and
>>> moved on to other frameworks because of how the project is structured and
>>> incentivized. There is a lot of promise in Spark from a technical
>>> standpoint and I hope that it can continue to grow.
>>>
>>> Sincerely
>>> Andrew
>>>
>>>
>>>> On Sat, May 30, 2026 at 12:29 AM Asif Shahid <[email protected]>
>>>> wrote:
>>>>
>>>>> This control , exclusivity, the requirement to build credibility by
>>>>> starting small ( like fixing formatting  , stats ,log) , to leave complex
>>>>> issues to other " bright " minds, the informal hierarchy,  may be this is
>>>>> how the open source works.
>>>>> Whatever it's , it does not sound "community " to me.
>>>>> This is a club ( cartel is offensive to some or most), with usual
>>>>> struggle for power , control and politics.
>>>>>
>>>>> On Fri, May 29, 2026, 8:31 AM Asif Shahid <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The last line is to be read as
>>>>>>
>>>>>>
>>>>>> That is how exclusivity and good control  is to be maintained.
>>>>>>
>>>>>> On Fri, May 29, 2026, 8:29 AM Asif Shahid <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> To the open source C,
>>>>>>> As it's apparent to me and I believe tacitly admitted by the group
>>>>>>> in general and heard explicitly in person
>>>>>>> Any relatively complex PR which involves deeper thinking ( be it
>>>>>>> functional or performance issue) should be the business of member.
>>>>>>> If it's performance issue , no way .
>>>>>>> If it's functional issue which is becoming embarrassment to ignore,
>>>>>>> somehow ensure that the push happens under a member's PR.
>>>>>>>
>>>>>>> That is how exclusivity and good is to be maintained.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 29, 2026, 8:05 AM Asif Shahid <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Based on the data I have and discussed, it's my view that the PRs
>>>>>>>> opened by you were reactive, happening only after I had opened the 
>>>>>>>> initial
>>>>>>>> ticket and PRs.
>>>>>>>> You are talking about simplifying the issue
>>>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537,
>>>>>>>> I am willing to discuss it here ,over meeting  with other members
>>>>>>>> of your open source group, as to how it simplifies?
>>>>>>>>
>>>>>>>> In fact , I had repeatedly said that  why are we discussing in
>>>>>>>> internal channel of company for the PR which I had filed in public Open
>>>>>>>> source . In that discussion ( the last one, before I was made 
>>>>>>>> redundant by
>>>>>>>> company),  I had given detailed explanation of why making each plan 
>>>>>>>> node
>>>>>>>> emit indeterministic  is bad idea. ( I would ask you to make that last
>>>>>>>> slack public, but I am sure that would be an issue as your company 
>>>>>>>> policy
>>>>>>>> might prohibit).
>>>>>>>>
>>>>>>>> I understood much earlier why you and your colleague never wanted
>>>>>>>> technical discussions on my  public PRs on PR itself..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The same holds for other alternate PRs including   the issue of
>>>>>>>> "self joins".
>>>>>>>> I am willing to discuss it out with your group members, the problem
>>>>>>>> it solves and what your alternative PR does not.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure if this is generic approach of the "members", to
>>>>>>>> ensure that final checkin happens under their authorship.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 29, 2026, 1:58 AM Peter Toth <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Viquar,
>>>>>>>>>
>>>>>>>>> To resolve the immediate discrepancy, I ask that we formally link
>>>>>>>>>> SPARK-45866 / PR #49154 within SPARK-56694 as "previously proposed 
>>>>>>>>>> by," and
>>>>>>>>>> add a JIRA comment explicitly crediting Asif as the original 
>>>>>>>>>> co-discoverer
>>>>>>>>>> of both the regression and the baseline fix. This standard 
>>>>>>>>>> attribution
>>>>>>>>>> costs us nothing but preserves the integrity of our commit history.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> SPARK-56694 is a duplicate of SPARK-45658 (not SPARK-45866), but I
>>>>>>>>> agree it's a fair point to link the tickets and mention Asif's 
>>>>>>>>> previous
>>>>>>>>> work. Let me add a comment to both the ticket and the PR.
>>>>>>>>>
>>>>>>>>> Conversely, SPARK-56694 bypassed the queue and was merged within
>>>>>>>>>> eight hours
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't know, is there a queue? As for my work process, when I
>>>>>>>>> have some time for upstream reviews, I don't follow any queue. I just 
>>>>>>>>> pick
>>>>>>>>> PRs that I find interesting or that relate to my experience with 
>>>>>>>>> Spark. And
>>>>>>>>> despite its size,
>>>>>>>>> https://github.com/apache/spark/pull/55644/changes is technically
>>>>>>>>> just a one-liner, fairly trivial fix so review within 8 hours isn't
>>>>>>>>> extraordinary.
>>>>>>>>>
>>>>>>>>> Hi Asif,
>>>>>>>>>
>>>>>>>>> you opened an alternate PR, which...
>>>>>>>>>>
>>>>>>>>> What issue did u see in the logic, that an alternate PR was
>>>>>>>>>> opened...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the reason for my simplification approach was discussed
>>>>>>>>> both offline and online in this thread:
>>>>>>>>> https://github.com/apache/spark/pull/50757#discussion_r2069390537
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> On Thu, May 28, 2026 at 10:29 PM vaquar khan <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have thoroughly reviewed the technical artifacts surrounding
>>>>>>>>>> the recent Catalyst optimizer canonicalization discussions to help 
>>>>>>>>>> guide
>>>>>>>>>> this toward a constructive resolution.
>>>>>>>>>>
>>>>>>>>>> We must address a tangible breakdown in our review pipeline.
>>>>>>>>>> SPARK-45866 and its corresponding PR #49154 correctly identified this
>>>>>>>>>> complex Catalyst regression in late 2023, yet the ticket remained
>>>>>>>>>> unaddressed. *Conversely, SPARK-56694 bypassed the queue and was
>>>>>>>>>> merged within eight hours without referencing the prior art*.
>>>>>>>>>> Peter has transparently acknowledged the oversight in searching for
>>>>>>>>>> existing tickets, but we still need to close the loop.
>>>>>>>>>>
>>>>>>>>>> To resolve the immediate discrepancy,* I ask that we formally
>>>>>>>>>> link SPARK-45866 / PR #49154 within SPARK-56694 as "previously 
>>>>>>>>>> proposed
>>>>>>>>>> by," and add a JIRA comment explicitly crediting Asif as the original
>>>>>>>>>> co-discoverer of both the regression and the baseline fix. This 
>>>>>>>>>> standard
>>>>>>>>>> attribution costs us nothing but preserves the integrity of our 
>>>>>>>>>> commit
>>>>>>>>>> history.  *
>>>>>>>>>>
>>>>>>>>>> Stepping back, this incident highlights a critical systemic risk
>>>>>>>>>> to our contributor ecosystem. The stark asymmetry in review velocity 
>>>>>>>>>> where
>>>>>>>>>> an external contributor's highly complex PR sits stagnant for 
>>>>>>>>>> months/years,
>>>>>>>>>> while an identical internal PR is merged in hours creates visible 
>>>>>>>>>> friction.
>>>>>>>>>> Even if entirely unintentional due to organizational overload, this 
>>>>>>>>>> pattern
>>>>>>>>>> discourages the high-level engineering talent required to sustain the
>>>>>>>>>> project's momentum.
>>>>>>>>>>
>>>>>>>>>> To maintain Spark’s technical leadership, we must actively
>>>>>>>>>> cultivate a culture where contributions are prioritized strictly by 
>>>>>>>>>> their
>>>>>>>>>> architectural merit, regardless of authorship. Furthermore, we must
>>>>>>>>>> normalize the habit of proactively acknowledging independent work 
>>>>>>>>>> when
>>>>>>>>>> parallel discoveries surface. Small, intentional shifts in our 
>>>>>>>>>> governance
>>>>>>>>>> and review cadence will yield massive dividends in community trust 
>>>>>>>>>> and
>>>>>>>>>> long-term innovation.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Viquar Khan
>>>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/?skipRedirect=true
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 28 May 2026 at 13:42, Asif Shahid <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Also I must admit that  I did not know oss works by opening
>>>>>>>>>>> alternate PRs.
>>>>>>>>>>> In the places where I have worked most of my life, we work on
>>>>>>>>>>> the opened PR with the original author and try to bridge the gap.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 28, 2026, 11:25 AM Asif Shahid <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> In fact, I showed it not just to you but other colleague of
>>>>>>>>>>>> yours too. But there has been absolutely no comment or anything on 
>>>>>>>>>>>> that
>>>>>>>>>>>> from then , till now.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 28, 2026 at 11:19 AM Asif Shahid <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> also take a look at this jira
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-47320
>>>>>>>>>>>>> for this also an alternate PR was opened.
>>>>>>>>>>>>> This problem is do deep in code, that I even showed you that
>>>>>>>>>>>>> in the existing test itself, if the join condition's operand are 
>>>>>>>>>>>>> swapped,
>>>>>>>>>>>>> test fails.. Its completely broken , the self joins.
>>>>>>>>>>>>> I had proposed a consistent fix, which solves the issue
>>>>>>>>>>>>> completely and logically, but again an alternate PR was filed..
>>>>>>>>>>>>> What issue was there in my PR , which I created...?
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Asif
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, May 28, 2026 at 11:14 AM Asif Shahid <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 28, 2026 at 10:56 AM Peter Toth <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as
>>>>>>>>>>>>>>>> its a one liner, test has uncanny resemblance
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Asif, what exactly is the "uncanny resemblance" between
>>>>>>>>>>>>>>> those test cases in
>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs
>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes ?
>>>>>>>>>>>>>>> Besides the fact that obviously they are comparing 
>>>>>>>>>>>>>>> canonicalized forms.
>>>>>>>>>>>>>>> Again, sorry for not noticing your PR, but I don't feel my
>>>>>>>>>>>>>>> fix has anything to do with yours.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok. I respect your opinion.  Each one is entitled to its own
>>>>>>>>>>>>>> view
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Look at bug
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent
>>>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a 
>>>>>>>>>>>>>>>> different PR ,
>>>>>>>>>>>>>>>> taken a different route.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you see anything in common between
>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50029/changes and
>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757/changes ?
>>>>>>>>>>>>>>> Because I do see. That someone else had a much better idea:
>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/50757#issuecomment-2844972082
>>>>>>>>>>>>>>> / https://github.com/apache/spark/pull/50230 and it was
>>>>>>>>>>>>>>> implemented for the benefit of Spark.
>>>>>>>>>>>>>>> IMO, that's the normal way of dealing with issues in an
>>>>>>>>>>>>>>> open-source project. Ideas come and go and hopefully the one 
>>>>>>>>>>>>>>> best wins.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The checksum approach has its expense. That can come later ,
>>>>>>>>>>>>>> because apriori its possible to detect whether the expression is 
>>>>>>>>>>>>>> returning
>>>>>>>>>>>>>> value from an indeterministic expression.
>>>>>>>>>>>>>> You opened an alternate PR, which I have described in the PR
>>>>>>>>>>>>>> discussion that to fix the round robin issue which you were 
>>>>>>>>>>>>>> dealing with,
>>>>>>>>>>>>>> you are trying to impose an order in in-deterministic expression
>>>>>>>>>>>>>> evaluattion, which itself is against the basic premise that if 
>>>>>>>>>>>>>> data is
>>>>>>>>>>>>>> in-determinate, there cannot be order in it.
>>>>>>>>>>>>>> What issue did u see in the logic, that an alternate PR was
>>>>>>>>>>>>>> opened...which impacted all the stages ( including the 
>>>>>>>>>>>>>> ancestors?) and I
>>>>>>>>>>>>>> already discussed internally why the idea you had in mind would 
>>>>>>>>>>>>>> not work. I
>>>>>>>>>>>>>> specifically asked, why dont we discuss via the PR filed...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:38 PM Asif Shahid <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>>>>>>> You wanted some examples , right:
>>>>>>>>>>>>>>>> 1) Look at bug
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-51016
>>>>>>>>>>>>>>>> To discover this bug and reproduce it reliably, I spent
>>>>>>>>>>>>>>>> nearly 2- 3 weeks. I filed a PR. The bug was fixed via a 
>>>>>>>>>>>>>>>> different PR ,
>>>>>>>>>>>>>>>> taken a different route.
>>>>>>>>>>>>>>>> Did any one who created new PR and route, showed up any
>>>>>>>>>>>>>>>> unaddressable logical issue?
>>>>>>>>>>>>>>>> The same goes for all the PRs ( which in case I have closed)
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> Asif
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 9:06 AM Nicholas Chammas <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think repeatedly calling the contributors on this list a
>>>>>>>>>>>>>>>>> “cartel” is not conducive to a calm and amicable resolution.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You may have some history built up that led you to use
>>>>>>>>>>>>>>>>> that word, but to the rest of us it comes out of nowhere; you 
>>>>>>>>>>>>>>>>> in fact
>>>>>>>>>>>>>>>>> opened this thread with that attack. If you keep making your 
>>>>>>>>>>>>>>>>> case in this
>>>>>>>>>>>>>>>>> manner, you will just turn everyone against you.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If there is a history of what you feel is others stealing
>>>>>>>>>>>>>>>>> your work, please link to a few examples so we can see what 
>>>>>>>>>>>>>>>>> you are seeing.
>>>>>>>>>>>>>>>>> If you can’t do that, then just focus on this current 
>>>>>>>>>>>>>>>>> example. And try to
>>>>>>>>>>>>>>>>> refrain from calling people names unless your goal is just to 
>>>>>>>>>>>>>>>>> have a fight,
>>>>>>>>>>>>>>>>> as opposed to resolving the problematic behavior so you can 
>>>>>>>>>>>>>>>>> continue to
>>>>>>>>>>>>>>>>> contribute.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not a committer and don’t have any special role in
>>>>>>>>>>>>>>>>> this community. I am speaking just as an observer and regular 
>>>>>>>>>>>>>>>>> contributor
>>>>>>>>>>>>>>>>> to the project.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > I have experienced this before, as recent as couple of
>>>>>>>>>>>>>>>>> months back (
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For others following along, I took a look at this ticket
>>>>>>>>>>>>>>>>> and the associated PRs: #53261
>>>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53261> / #53100
>>>>>>>>>>>>>>>>> <https://github.com/apache/spark/pull/53100>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It looks like Asif is upset that he submitted a fix for
>>>>>>>>>>>>>>>>> the same issue a week or so prior to the fix that eventually 
>>>>>>>>>>>>>>>>> got merged.
>>>>>>>>>>>>>>>>> But the fixes are different, and the one that got merged is a 
>>>>>>>>>>>>>>>>> lot shorter,
>>>>>>>>>>>>>>>>> though they are both simple. The PR that got merged was 
>>>>>>>>>>>>>>>>> submitted by
>>>>>>>>>>>>>>>>> someone who appears to be employed by Databricks; perhaps 
>>>>>>>>>>>>>>>>> this is part of
>>>>>>>>>>>>>>>>> the “cartel” accusation. The two PRs were reviewed by 
>>>>>>>>>>>>>>>>> different committers,
>>>>>>>>>>>>>>>>> however, and the one that got merged was merged in by someone 
>>>>>>>>>>>>>>>>> who does
>>>>>>>>>>>>>>>>> _not_ work for Databricks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don’t see anything here other than the normal dynamic of
>>>>>>>>>>>>>>>>> a large and busy open source project. Committer attention is 
>>>>>>>>>>>>>>>>> limited;
>>>>>>>>>>>>>>>>> things fall through the cracks; different contributors may 
>>>>>>>>>>>>>>>>> occasionally
>>>>>>>>>>>>>>>>> work on the same thing without knowing about each other. A 
>>>>>>>>>>>>>>>>> minor help to
>>>>>>>>>>>>>>>>> this specific problem would be to have some way of 
>>>>>>>>>>>>>>>>> automatically linking
>>>>>>>>>>>>>>>>> issues that appear to be about the same thing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On May 28, 2026, at 11:33 AM, Asif Shahid <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>> Pls see inline for comments/ replies
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 6:11 AM Peter Toth <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey Asif,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Are you referring to
>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes vs.
>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644/changes?
>>>>>>>>>>>>>>>>>> Those are definitely solving the same issue but I can assure 
>>>>>>>>>>>>>>>>>> you I wouldn't
>>>>>>>>>>>>>>>>>> take any code from your PR without consulting with you first.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Yes Indeed Peter, I am referring to those.
>>>>>>>>>>>>>>>>> As for the fix, itself, is not indicative of any thing as
>>>>>>>>>>>>>>>>> its a one liner, test has uncanny resemblance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As far as I remember, I opened SPARK-56694 /
>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55644 because I ran
>>>>>>>>>>>>>>>>>> into that minor bug during the implementation of
>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/55298.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry, I didn't check whether a ticket or PR already
>>>>>>>>>>>>>>>>>> existed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The below I am addressing to the whole cartel.:
>>>>>>>>>>>>>>>>> I have experienced this before, as recent as couple of
>>>>>>>>>>>>>>>>> months back (
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-54386)
>>>>>>>>>>>>>>>>> I have experienced,  my personal effort ( going into
>>>>>>>>>>>>>>>>> weeks) to debug, reproduce issue reliably , being hijacked by 
>>>>>>>>>>>>>>>>> members,
>>>>>>>>>>>>>>>>> without even discussing the fix proposed, ( by opening new 
>>>>>>>>>>>>>>>>> PRs). ( If
>>>>>>>>>>>>>>>>> interested, I can provide details of the PRs / issues I am 
>>>>>>>>>>>>>>>>> talking about)
>>>>>>>>>>>>>>>>> I have seen a perfectly valid PR being nixed , by
>>>>>>>>>>>>>>>>> following comment which essentially said
>>>>>>>>>>>>>>>>> "  my code of making the cache lookup more effective ,
>>>>>>>>>>>>>>>>> would result in greater chances of stale cache being picked,  
>>>>>>>>>>>>>>>>> which already
>>>>>>>>>>>>>>>>> spark suffers from."
>>>>>>>>>>>>>>>>> Now the PR was related to collapsing the projects in
>>>>>>>>>>>>>>>>> analysis phase, and side effect was cache pick up being more 
>>>>>>>>>>>>>>>>> sensitive.
>>>>>>>>>>>>>>>>> So this is such a frivolous reason to nix the PR , because
>>>>>>>>>>>>>>>>> "staleness" is an underlying existing issue which had nothing 
>>>>>>>>>>>>>>>>> to do with my
>>>>>>>>>>>>>>>>> PR. And its more amusing , that if a DB is giving even one 
>>>>>>>>>>>>>>>>> wrong result in
>>>>>>>>>>>>>>>>> millions, that makes all the results a suspect in any case. 
>>>>>>>>>>>>>>>>> It does not
>>>>>>>>>>>>>>>>> matter at what frequency this occurs. To me the real reason 
>>>>>>>>>>>>>>>>> was code
>>>>>>>>>>>>>>>>> complexity ( & more likely  the loss of control of the code 
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> outsider).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The reason I call this open source community as cartel, is
>>>>>>>>>>>>>>>>> because, I have seen the way it works pretty closely and have 
>>>>>>>>>>>>>>>>> experienced
>>>>>>>>>>>>>>>>> it in the email exchanges which happen on this group.
>>>>>>>>>>>>>>>>> For the same PR , same issue,  if advertently or
>>>>>>>>>>>>>>>>> inadvertently , other person ( especially a member) gets his 
>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>> pushed, by the virtue of his standing/position and the "for 
>>>>>>>>>>>>>>>>> profit" company
>>>>>>>>>>>>>>>>> the person works, how would you give the credit to the 
>>>>>>>>>>>>>>>>> original person who
>>>>>>>>>>>>>>>>> discovered the issue first / provided the fix?
>>>>>>>>>>>>>>>>> Why are issues filed by some immediately worked upon by
>>>>>>>>>>>>>>>>> members ( some of whom claim to be working full time on 
>>>>>>>>>>>>>>>>> spark) ? Is it
>>>>>>>>>>>>>>>>> because certain companies / groups ( for profit companies, 
>>>>>>>>>>>>>>>>> mind you )
>>>>>>>>>>>>>>>>> exert undue control, or the petty newbee has to be in the 
>>>>>>>>>>>>>>>>> good books of
>>>>>>>>>>>>>>>>> members ( with the hope that at some point they will also 
>>>>>>>>>>>>>>>>> reach that
>>>>>>>>>>>>>>>>> position of power ?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Given the AI advent and such occurrences,  how will you
>>>>>>>>>>>>>>>>> give due credit to the original creators and how do you plan 
>>>>>>>>>>>>>>>>> to prevent
>>>>>>>>>>>>>>>>> some member for taking up idea of any old open PR ( which for 
>>>>>>>>>>>>>>>>> reasons of
>>>>>>>>>>>>>>>>> complexity and non technical reasons) ,  polishing it up and 
>>>>>>>>>>>>>>>>> pushing it as
>>>>>>>>>>>>>>>>> their own?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am also curious , am I the only one who is troubled by
>>>>>>>>>>>>>>>>> all this, or there are others who have experienced it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> Asif
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you have further improvements please feel free to open
>>>>>>>>>>>>>>>>>> a PR.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> I had filed a bug
>>>>>>>>>>>>>>>>>>>  https://issues.apache.org/jira/browse/SPARK-45866
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I had also opened a PR for the same.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now I see that the ticket I  filed is still open, but
>>>>>>>>>>>>>>>>>>> the issue has been fixed using a new ticket
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-56694
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and on top of that the bug test and ofcourse the fix (
>>>>>>>>>>>>>>>>>>> which in any case would be same) has been taken from my PR 
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> To me this is clear unethical conduct of cartel member,
>>>>>>>>>>>>>>>>>>> unless I am missing some valid reason.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> And the irony is that the fix is still incomplete, as I
>>>>>>>>>>>>>>>>>>> just found and filed a new ticket
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-57126
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I know that atleast some cartel members are insecure and
>>>>>>>>>>>>>>>>>>> think of OSS as their fiefdom, but this sort of behaviour , 
>>>>>>>>>>>>>>>>>>> I never
>>>>>>>>>>>>>>>>>>> expected.
>>>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>>>> Asif
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Reply via email to