Thank you Tian. Your points are fair and valid.
On Thu, May 28, 2026 at 9:14 AM Tian Gao via dev <[email protected]> wrote: > Hi Asif, > > First of all, I understand your frustration. Even though cases like this > happen to many open source contributors from time to time, it's still super > discouraging and annoying. > > I don't believe Peter, or any other committer, intentionally "stole" your > idea. It's a very common working process for committers to run into an > issue, find the cause and fix it. Unfortunately, due to various reasons (in > my case, JIRA search is very difficult to use), not all committers perform > a full search for existing tickets. > > I took a look at your tickets and PRs, I think the core issue is that many > of your PRs were not reviewed, so no one realized the problem was found and > a fix was proposed. The bug remained in the code base and later someone > else found the same issue and fixed it. > > > It's an unfortunate but common problem in any open source community that > authors with higher reputations get more attention. The fundamental reason > is we don't have enough eyes. With the number of AI slops increasing these > days, it's even harder to properly review every single PR and determine > whether they are valid. Committers tend to treat PRs from other committers > (or frequent contributors) more seriously, resulting in faster reviews. > > I'm not saying this situation will be magically getting much better > overnight, but I do have a few suggestions that might help you. > > First, make sure your PR is ready for review. It's not a criticism of any > of your PRs, but general guidance for anyone interested in contributing to > Spark (or any other open source projects). Make sure they are well > explained, properly tested and pass all tests. > > Then, tag some committers. Not everyone, but committers who understand > your code and own those components. This might take some time initially, > but you can start by examining commit histories. This will put your PR on > their radar. > > Finally, address committers' comments quickly and tag them again if they > don't respond. It's an open source project, everyone needs to take some > initiative and build their reputation. No committer is obligated to review > every single PR. > > I do hope things will get better for you eventually. An open source > project is a community and it needs efforts from everyone who cares about > it :) > > Tian > > On Thu, May 28, 2026 at 8:33 AM Asif Shahid <[email protected]> wrote: > >> Hi Peter, >> Pls see inline for comments/ replies >> >> On Thu, May 28, 2026 at 6:11 AM Peter Toth <[email protected]> wrote: >> >>> Hey Asif, >>> >>> Are you referring to https://github.com/apache/spark/pull/49154/changes >>> vs. https://github.com/apache/spark/pull/55644/changes? Those are >>> definitely solving the same issue but I can assure you I wouldn't take any >>> code from your PR without consulting with you first. >>> >> Yes Indeed Peter, I am referring to those. >> As for the fix, itself, is not indicative of any thing as its a one >> liner, test has uncanny resemblance. >> >> >>> As far as I remember, I opened SPARK-56694 / >>> https://github.com/apache/spark/pull/55644 because I ran into that >>> minor bug during the implementation of >>> https://github.com/apache/spark/pull/55298. >>> >> >> >>> Sorry, I didn't check whether a ticket or PR already existed. >>> >> >> The below I am addressing to the whole cartel.: >> I have experienced this before, as recent as couple of months back ( >> https://issues.apache.org/jira/browse/SPARK-54386) >> I have experienced, my personal effort ( going into weeks) to debug, >> reproduce issue reliably , being hijacked by members, without even >> discussing the fix proposed, ( by opening new PRs). ( If interested, I can >> provide details of the PRs / issues I am talking about) >> I have seen a perfectly valid PR being nixed , by following comment which >> essentially said >> " my code of making the cache lookup more effective , would result in >> greater chances of stale cache being picked, which already spark suffers >> from." >> Now the PR was related to collapsing the projects in analysis phase, and >> side effect was cache pick up being more sensitive. >> So this is such a frivolous reason to nix the PR , because "staleness" is >> an underlying existing issue which had nothing to do with my PR. And its >> more amusing , that if a DB is giving even one wrong result in millions, >> that makes all the results a suspect in any case. It does not matter at >> what frequency this occurs. To me the real reason was code complexity ( & >> more likely the loss of control of the code to the outsider). >> >> The reason I call this open source community as cartel, is because, I >> have seen the way it works pretty closely and have experienced it in the >> email exchanges which happen on this group. >> For the same PR , same issue, if advertently or inadvertently , other >> person ( especially a member) gets his changes pushed, by the virtue of his >> standing/position and the "for profit" company the person works, how would >> you give the credit to the original person who discovered the issue first / >> provided the fix? >> Why are issues filed by some immediately worked upon by members ( some of >> whom claim to be working full time on spark) ? Is it because certain >> companies / groups ( for profit companies, mind you ) exert undue >> control, or the petty newbee has to be in the good books of members ( with >> the hope that at some point they will also reach that position of power ?) >> >> Given the AI advent and such occurrences, how will you give due credit >> to the original creators and how do you plan to prevent some member for >> taking up idea of any old open PR ( which for reasons of complexity and non >> technical reasons) , polishing it up and pushing it as their own? >> >> I am also curious , am I the only one who is troubled by all this, or >> there are others who have experienced it? >> >> Regards >> Asif >> >> >>> If you have further improvements please feel free to open a PR. >>> >>> Best, >>> Peter >>> >>> On Thu, May 28, 2026 at 8:20 AM Asif Shahid <[email protected]> >>> wrote: >>> >>>> Hi, >>>> I had filed a bug >>>> https://issues.apache.org/jira/browse/SPARK-45866 >>>> >>>> I had also opened a PR for the same. >>>> >>>> Now I see that the ticket I filed is still open, but the issue has >>>> been fixed using a new ticket >>>> https://issues.apache.org/jira/browse/SPARK-56694 >>>> >>>> and on top of that the bug test and ofcourse the fix ( which in any >>>> case would be same) has been taken from my PR for >>>> https://github.com/apache/spark/pull/49154/changes#diff-137d880ff73623bf7a452bb84f9c3dbbb27ba929e7f5e070c6bff68cfc8ec71f >>>> >>>> To me this is clear unethical conduct of cartel member, unless I am >>>> missing some valid reason. >>>> >>>> And the irony is that the fix is still incomplete, as I just found and >>>> filed a new ticket >>>> https://issues.apache.org/jira/browse/SPARK-57126 >>>> >>>> I know that atleast some cartel members are insecure and think of OSS >>>> as their fiefdom, but this sort of behaviour , I never expected. >>>> Regards >>>> Asif >>>> >>>
