My 2c -

IMHO, the argument of Github vs ASF in terms of spamming by AI does not
sound to be very strong.

It sounds to me as just a matter of "motivation" - if they worked with
Github API, then they can mess up whole Github repos, while they need to
deal with JIRA API to mess up with whole ASF JIRA systems. The attacker may
find it more interesting to attack Github repos since account creation is
easier and they can mess up more widespread communities on a single effort,
but if they find it interesting to attack ASF JIRA anyway, the code to
attack will be just created by AI (significant drop of the effort) and the
difficulty wouldn't be noticeably different. They just need to fake the
account creation to be a valid one. Human beings never know about the
intention of the account creation request, so this just slows them down "a
bit" till we do the diligence to create an account. I don't clearly know
about the request for account creation, but it is technically feasible that
we do due diligence to create an account which could be used to attack
"other ASF projects".
(Which criteria do we use to justify the validation of the account creation
request? Is it ever valid one to prevent fake account creation?)

In addition, they don't want to just impact only the Apache Spark project
once they have access to the system - that isn't worth the effort. Flipping
the coin, collective community efforts on detecting, and spam filtering
systems on INFRA would help Github (as a service vendor) to find the
account(s) and block them, and probably apply broader blockage (e.g. IP
range) if there is a pattern of access. In terms of INFRA, there is no
evidence to believe Github as a service vendor cares less than ASF INFRA in
terms of protecting the system from spamming, which may drive the
conclusion of which system is more vulnerable.

It's a valid concern that there were AI slop issues among ASF projects with
Github issues in 2025. Do we see this more frequently? If that happened
only one in 2025, IMHO, it's manageable, with collaborative efforts with
other ASF projects.


On Wed, Feb 4, 2026 at 6:38 AM Tian Gao via dev <[email protected]>
wrote:

> For AI generated spams - I've been working on CPython repo, which is
> completely public, for more than 2 years. There are AI generated
> issues/PRs, as well as human spams, but it's never been a huge problem.
> github as a platform is working on it and for most issues we can just close
> it (it really does not happen that much).
>
> From another perspective, every single registration request to JIRA is
> "spam". It requires the PMC's attention and the PMC has to decide whether
> it's a "legit" request. If we approve all requests, that's like
> no-human-in-the-loop. Otherwise, how could a PMC determine which request is
> legit? It takes time right? According to the data from the other thread, we
> had 492 "spam" last year - it's unlikely we have this many AI generated
> issues/PRs if we migrate to github.
>
> Tian Gao
>
> On Mon, Feb 2, 2026 at 5:55 AM Wenchen Fan <[email protected]> wrote:
>
>> Apache Spark is already using Github heavily to accept code changes, so
>> 1, 2, and 3 do not seem to be an issue to me. AI spam can already happen
>> with PRs (writing code is cheap today with LLM) and opening Github Issues
>> won't make a big difference.
>>
>> > 4. Degraded Traceability.
>>
>> This makes sense. Until we find a better solution with a smooth migration
>> plan, I think we should still open JIRA tickets for code changes that worth
>> to trace. For now we only allow minor changes to omit JIRA, maybe we should
>> extend it, like only create JIRA tickets for changes that worth to be
>> mentioned in the release notes. We can follow the common practice from
>> other Apache projects.
>>
>> On Mon, Feb 2, 2026 at 9:33 PM Nicholas Chammas <
>> [email protected]> wrote:
>>
>>>
>>> > On Feb 2, 2026, at 2:02 AM, Dongjoon Hyun <[email protected]> wrote:
>>> >
>>> > Adopting GitHub Issues imposes a new burden. Committers and PMC
>>> members will face increased overhead in moderating AI spam. Unlike the
>>> current ASF JIRA system, which now requires PMC approval for account
>>> creation (a human-in-the-loop defense), GitHub accounts are easier to
>>> create and abuse. Banning GitHub accounts is a tedious and unpleasant task
>>> for maintainers.
>>>
>>> Banning accounts should not be that tedious. It’s just a couple of
>>> clicks, no?
>>>
>>> And accounts that are creating AI slop are likely doing it on other
>>> projects, meaning that GitHub will already be getting signals about an
>>> account’s quality before any Spark admin takes action. Severe offenders are
>>> likely to be banned globally by GitHub.
>>>
>>> Being on GitHub also means that we will have access to more tools that
>>> are integrated with the GitHub ecosystem to help manage problems like
>>> these, should we need them.
>>>
>>> > 3. Accessibility and Vendor Lock-in.
>>> >
>>> > We should not assume all users want or have GitHub accounts. Issue
>>> reporting is distinct from code contribution. The current ecosystem allows
>>> anyone with an email to participate via ASF infrastructure. Mandating
>>> GitHub Issues forces users onto a third-party commercial platform,
>>> effectively raising the barrier to entry for bug reporting.
>>>
>>> This argument doesn’t make sense to me. Earlier you were warning that
>>> lowering the barrier to entry will lead to more moderation overhead, but
>>> here you are arguing the opposite, that GitHub raises the barrier to entry.
>>>
>>> GitHub is so widely used in our industry that it’s reasonable to assume
>>> that it offers the lowest barrier to entry of any code collaboration
>>> platform. Reporting issues is a basic part of that.
>>>
>>> I think it would be difficult to find anyone, especially a new
>>> contributor, who feels that ASF Jira is more accessible than GitHub.
>>>
>>> Nick
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>

Reply via email to