For 1 and 2 (I believe they are the same issue), I agree. It's a tradeoff,
you can't have a low barrier for users and better protection for AI
generated issues. That's a concern for migrating to github issues, not a
reason. If we only allow committers to create issues, we will have even
better protection against AI generated content, but no community.

For 3, I'm not sure if you are talking about ASF JIRA or the mail list. Yes
as long as the user has an email address they can participate discussion
with mail list, and here's the number of emails we have for user mail list:

[image: image.png]

I think there's a pattern here that anyone can observe.

BTW, I can't use my personal email to subscribe to dev/user mail list - and
I don't know who I should contact for this matter. I would be a user that
is blocked out of the discussion ring without anyone realizing it.

If ASF JIRA was referred there - the number of github users is far more
than the number of ASF JIRA users. Also registering for a github account is
easier than registering for a JIRA account - like you mentioned in 1. I
think forcing users to report on ASF JIRA has a higher bar than forcing
them to do it on github.

For 4, that is a problem, but there are simple solutions. For example,
SPARKGH-xxxxx, even though 2 letters longer, can refer to spark github
issues globally.

Of course, if we are talking about - oh what if github died in the future -
which is a possibility - then that's a valid point.

I want to share something that works for CPython. In CPython, there are
core devs, which is equivalent to committers, who have write privilege to
the repo. And there is another role before becoming a core dev - a triager.
A triager can label or close github issues. A triager title is a
recognition to users who have been contributing to the repository and
willing to do more for the project.

I'm not saying we should add the same role for spark, but I don't believe
if we go with github issues, PMCs have to be swamped in AI generated issues
every day.

I don't think github issues is the perfect solution, but I do believe it's
an improvement, a significant one if I may say. I don't believe we should
do something because other people are doing it, but if most people are
walking in the same direction, we might as well think about it - we can't
be the only smart people in the world. I have listed multiple major ASF
projects that migrated from JIRA to github issues, but I can't yet find a
single project that went the other way recently (airflow did github -> JIRA
in 2016 because of Apache incubator and they migrated back later).

We can list hundreds of problems of migrating to github issues, just like
we can list the same amount of the current system. But the thing that
really matters is if it's worth it - if the benefit can trump the cost. Are
we staying with JIRA because it's better, or "this piece of code has been
like this for 10 years".

Tian Gao

On Sun, Feb 1, 2026 at 11:02 PM Dongjoon Hyun <[email protected]> wrote:

> I'm glad to see such active discussion on our mailing list, and I really
> appreciate everyone's passion. It's important that we agree on all aspects
> of this topic. To that end, I'd like to share a few additional points
> regarding GitHub Issues in ASF projects. Please consider this a side note
> to the ongoing discussion.
>
> 1. Vulnerability to AI-generated spam.
>
> GitHub Issues are significantly more exposed to low-quality AI content
> ("AI slop"). As seen in the Apache Airflow incident below, this problem
> will likely worsen.
>
> https://lists.apache.org/thread/2vmvv429sowq90x96d5w2fxpc298cy3l
> (2025-01-21)
>
> Apache Airflow examples:
> https://github.com/apache/airflow/issues/45867
> https://github.com/apache/airflow/issues/45856
>
> Apache Iceberg example:
> https://github.com/apache/iceberg/issues/12039
>
>
> 2. Increased administrative overhead.
>
> Adopting GitHub Issues imposes a new burden. Committers and PMC members
> will face increased overhead in moderating AI spam. Unlike the current ASF
> JIRA system, which now requires PMC approval for account creation (a
> human-in-the-loop defense), GitHub accounts are easier to create and abuse.
> Banning GitHub accounts is a tedious and unpleasant task for maintainers.
>
>
> 3. Accessibility and Vendor Lock-in.
>
> We should not assume all users want or have GitHub accounts. Issue
> reporting is distinct from code contribution. The current ecosystem allows
> anyone with an email to participate via ASF infrastructure. Mandating
> GitHub Issues forces users onto a third-party commercial platform,
> effectively raising the barrier to entry for bug reporting.
>
>
> 4. Degraded Traceability.
>
> JIRA's globally unique IDs (SPARK-XXX) are superior for tracking changes
> across the ecosystem. GitHub's short references (#xxx) are ambiguous when
> viewed in forks or downstream repos. To achieve the same precision as a
> JIRA ID, we would need to use full URLs (b) in the code or commit messages,
> which is far less concise than (a)
>
> ```
> $ git log --oneline --format=%s | grep SPARK-54943
> [SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast`
> for now Revert "[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Mistake Commit"
> [SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast`
> [SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with
> `safe=False`
> ```
>
> Specifically, let's assume that Spark's other repositories or downstream
> projects require adding a reference to an issue in the Spark main
> repository. The current convention is to refer to JIRA IDs as follows.
> Given that (a) is recognized already well, (b) is a kind of regression.
>
>
> https://github.com/apache/spark-kubernetes-operator/blob/3cf1ec3fa09c1934efaf0e15b5a468348aa95faf/examples/pi-on-volcano.yaml#L29
>
> ```
> # This requires SPARK-54916 (or an image with `volcano`-enabled
> distribution)
> ```
>
> (a) JIRA: SPARK-54916
> (b) GitHub Issue ID: https://github.com/apache/spark/issues/54916
>

Reply via email to