I'm glad to see such active discussion on our mailing list, and I really
appreciate everyone's passion. It's important that we agree on all aspects
of this topic. To that end, I'd like to share a few additional points
regarding GitHub Issues in ASF projects. Please consider this a side note
to the ongoing discussion.
1. Vulnerability to AI-generated spam.
GitHub Issues are significantly more exposed to low-quality AI content ("AI
slop"). As seen in the Apache Airflow incident below, this problem will
likely worsen.
https://lists.apache.org/thread/2vmvv429sowq90x96d5w2fxpc298cy3l
(2025-01-21)
Apache Airflow examples:
https://github.com/apache/airflow/issues/45867
https://github.com/apache/airflow/issues/45856
Apache Iceberg example:
https://github.com/apache/iceberg/issues/12039
2. Increased administrative overhead.
Adopting GitHub Issues imposes a new burden. Committers and PMC members
will face increased overhead in moderating AI spam. Unlike the current ASF
JIRA system, which now requires PMC approval for account creation (a
human-in-the-loop defense), GitHub accounts are easier to create and abuse.
Banning GitHub accounts is a tedious and unpleasant task for maintainers.
3. Accessibility and Vendor Lock-in.
We should not assume all users want or have GitHub accounts. Issue
reporting is distinct from code contribution. The current ecosystem allows
anyone with an email to participate via ASF infrastructure. Mandating
GitHub Issues forces users onto a third-party commercial platform,
effectively raising the barrier to entry for bug reporting.
4. Degraded Traceability.
JIRA's globally unique IDs (SPARK-XXX) are superior for tracking changes
across the ecosystem. GitHub's short references (#xxx) are ambiguous when
viewed in forks or downstream repos. To achieve the same precision as a
JIRA ID, we would need to use full URLs (b) in the code or commit messages,
which is far less concise than (a)
```
$ git log --oneline --format=%s | grep SPARK-54943
[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast`
for now Revert "[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Mistake Commit"
[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable `test_pyarrow_array_cast`
[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with
`safe=False`
```
Specifically, let's assume that Spark's other repositories or downstream
projects require adding a reference to an issue in the Spark main
repository. The current convention is to refer to JIRA IDs as follows.
Given that (a) is recognized already well, (b) is a kind of regression.
https://github.com/apache/spark-kubernetes-operator/blob/3cf1ec3fa09c1934efaf0e15b5a468348aa95faf/examples/pi-on-volcano.yaml#L29
```
# This requires SPARK-54916 (or an image with `volcano`-enabled
distribution)
```
(a) JIRA: SPARK-54916
(b) GitHub Issue ID: https://github.com/apache/spark/issues/54916