The AI SPAM issues is a big concern, especially seeing the examples you provided. Maybe we should talk with some of the folks at the projects dealing with the AI SPAM issues on how they feel it’s going and if GitHub has any tools they’ve found useful for managing it.
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, Feb 2, 2026 at 8:02 AM Dongjoon Hyun <[email protected]> wrote: > I'm glad to see such active discussion on our mailing list, and I really > appreciate everyone's passion. It's important that we agree on all aspects > of this topic. To that end, I'd like to share a few additional points > regarding GitHub Issues in ASF projects. Please consider this a side note > to the ongoing discussion. > > 1. Vulnerability to AI-generated spam. > > GitHub Issues are significantly more exposed to low-quality AI content > ("AI slop"). As seen in the Apache Airflow incident below, this problem > will likely worsen. > > https://lists.apache.org/thread/2vmvv429sowq90x96d5w2fxpc298cy3l > (2025-01-21) > > Apache Airflow examples: > https://github.com/apache/airflow/issues/45867 > https://github.com/apache/airflow/issues/45856 > > Apache Iceberg example: > https://github.com/apache/iceberg/issues/12039 > > > 2. Increased administrative overhead. > > Adopting GitHub Issues imposes a new burden. Committers and PMC members > will face increased overhead in moderating AI spam. Unlike the current ASF > JIRA system, which now requires PMC approval for account creation (a > human-in-the-loop defense), GitHub accounts are easier to create and abuse. > Banning GitHub accounts is a tedious and unpleasant task for maintainers. > > > 3. Accessibility and Vendor Lock-in. > > We should not assume all users want or have GitHub accounts. Issue > reporting is distinct from code contribution. The current ecosystem allows > anyone with an email to participate via ASF infrastructure. Mandating > GitHub Issues forces users onto a third-party commercial platform, > effectively raising the barrier to entry for bug reporting. > > > 4. Degraded Traceability. > > JIRA's globally unique IDs (SPARK-XXX) are superior for tracking changes > across the ecosystem. GitHub's short references (#xxx) are ambiguous when > viewed in forks or downstream repos. To achieve the same precision as a > JIRA ID, we would need to use full URLs (b) in the code or commit messages, > which is far less concise than (a) > > ``` > $ git log --oneline --format=%s | grep SPARK-54943 > [SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast` > for now Revert "[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Mistake Commit" > [SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast` > [SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with > `safe=False` > ``` > > Specifically, let's assume that Spark's other repositories or downstream > projects require adding a reference to an issue in the Spark main > repository. The current convention is to refer to JIRA IDs as follows. > Given that (a) is recognized already well, (b) is a kind of regression. > > > https://github.com/apache/spark-kubernetes-operator/blob/3cf1ec3fa09c1934efaf0e15b5a468348aa95faf/examples/pi-on-volcano.yaml#L29 > > ``` > # This requires SPARK-54916 (or an image with `volcano`-enabled > distribution) > ``` > > (a) JIRA: SPARK-54916 > (b) GitHub Issue ID: https://github.com/apache/spark/issues/54916 >
