Hi Asif,

First of all, I understand your frustration. Even though cases like this
happen to many open source contributors from time to time, it's still super
discouraging and annoying.

I don't believe Peter, or any other committer, intentionally "stole" your
idea. It's a very common working process for committers to run into an
issue, find the cause and fix it. Unfortunately, due to various reasons (in
my case, JIRA search is very difficult to use), not all committers perform
a full search for existing tickets.

I took a look at your tickets and PRs, I think the core issue is that many
of your PRs were not reviewed, so no one realized the problem was found and
a fix was proposed. The bug remained in the code base and later someone
else found the same issue and fixed it.

It's an unfortunate but common problem in any open source community that
authors with higher reputations get more attention. The fundamental reason
is we don't have enough eyes. With the number of AI slops increasing these
days, it's even harder to properly review every single PR and determine
whether they are valid. Committers tend to treat PRs from other committers
(or frequent contributors) more seriously, resulting in faster reviews.

I'm not saying this situation will be magically getting much better
overnight, but I do have a few suggestions that might help you.

First, make sure your PR is ready for review. It's not a criticism of any
of your PRs, but general guidance for anyone interested in contributing to
Spark (or any other open source projects). Make sure they are well
explained, properly tested and pass all tests.

Then, tag some committers. Not everyone, but committers who understand your
code and own those components. This might take some time initially, but you
can start by examining commit histories. This will put your PR on their
radar.

Finally, address committers' comments quickly and tag them again if they
don't respond. It's an open source project, everyone needs to take some
initiative and build their reputation. No committer is obligated to review
every single PR.

I do hope things will get better for you eventually. An open source project
is a community and it needs efforts from everyone who cares about it :)

Tian

On Thu, May 28, 2026 at 8:33 AM Asif Shahid <[email protected]> wrote:

> Hi Peter,
> Pls see inline for comments/ replies
>
> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]> wrote:
>
>> Hey Asif,
>>
>> Are you referring to https://github.com/apache/spark/pull/49154/changes
>> vs. https://github.com/apache/spark/pull/55644/changes? Those are
>> definitely solving the same issue but I can assure you I wouldn't take any
>> code from your PR without consulting with you first.
>>
>  Yes Indeed Peter, I am referring to those.
> As for the fix, itself, is not indicative of any thing as its a one liner,
> test has uncanny resemblance.
>
>
>> As far as I remember, I opened SPARK-56694 /
>> https://github.com/apache/spark/pull/55644 because I ran into that minor
>> bug during the implementation of
>> https://github.com/apache/spark/pull/55298.
>>
>
>
>> Sorry, I didn't check whether a ticket or PR already existed.
>>
>
> The below I am addressing to the whole cartel.:
> I have experienced this before, as recent as couple of months back (
> https://issues.apache.org/jira/browse/SPARK-54386)
> I have experienced,  my personal effort ( going into weeks) to debug,
> reproduce issue reliably , being hijacked by members, without even
> discussing the fix proposed, ( by opening new PRs). ( If interested, I can
> provide details of the PRs / issues I am talking about)
> I have seen a perfectly valid PR being nixed , by following comment which
> essentially said
> "  my code of making the cache lookup more effective , would result in
> greater chances of stale cache being picked,  which already spark suffers
> from."
> Now the PR was related to collapsing the projects in analysis phase, and
> side effect was cache pick up being more sensitive.
> So this is such a frivolous reason to nix the PR , because "staleness" is
> an underlying existing issue which had nothing to do with my PR. And its
> more amusing , that if a DB is giving even one wrong result in millions,
> that makes all the results a suspect in any case. It does not matter at
> what frequency this occurs. To me the real reason was code complexity ( &
> more likely  the loss of control of the code to the outsider).
>
> The reason I call this open source community as cartel, is because, I have
> seen the way it works pretty closely and have experienced it in the email
> exchanges which happen on this group.
> For the same PR , same issue,  if advertently or inadvertently , other
> person ( especially a member) gets his changes pushed, by the virtue of his
> standing/position and the "for profit" company the person works, how would
> you give the credit to the original person who discovered the issue first /
> provided the fix?
> Why are issues filed by some immediately worked upon by members ( some of
> whom claim to be working full time on spark) ? Is it because certain
> companies / groups ( for profit companies, mind you )  exert undue
> control, or the petty newbee has to be in the good books of members ( with
> the hope that at some point they will also reach that position of power ?)
>
> Given the AI advent and such occurrences,  how will you give due credit to
> the original creators and how do you plan to prevent some member for taking
> up idea of any old open PR ( which for reasons of complexity and non
> technical reasons) ,  polishing it up and pushing it as their own?
>
> I am also curious , am I the only one who is troubled by all this, or
> there are others who have experienced it?
>
> Regards
> Asif
>
>
>> If you have further improvements please feel free to open a PR.
>>
>> Best,
>> Peter
>>
>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <[email protected]>
>> wrote:
>>
>>> Hi,
>>> I had filed a bug
>>>  https://issues.apache.org/jira/browse/SPARK-45866
>>>
>>> I had also opened a PR for the same.
>>>
>>> Now I see that the ticket I  filed is still open, but the issue has been
>>> fixed using a new ticket
>>> https://issues.apache.org/jira/browse/SPARK-56694
>>>
>>> and on top of that the bug test and ofcourse the fix ( which in any case
>>> would be same) has been taken from my PR for
>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f
>>>
>>> To me this is clear unethical conduct of cartel member, unless I am
>>> missing some valid reason.
>>>
>>> And the irony is that the fix is still incomplete, as I just found and
>>> filed a new ticket
>>> https://issues.apache.org/jira/browse/SPARK-57126
>>>
>>> I know that atleast some cartel members are insecure and think of OSS as
>>> their fiefdom, but this sort of behaviour , I never expected.
>>> Regards
>>> Asif
>>>
>>

Reply via email to