Hi all,

Thank you for all the discussions on this matter. I think we can all agree
that people are interested in this matter, no matter what their positions
are.

Like I said above, what I want is to make progress for a more involved
community, not figuring out the "best solution" and shooting for the moon.
So I have a new proposal that might make more people happy.

1. Open github issues.
2. Make an issue template where similar fields are required as current JIRA
tickets (issue type, component?, affected versions). I added a question
mark here because a reporter might not know how we categorize "components".
I don't know how this is used in our process. If it only helps triaging, we
can add the label later (by committers).
3. Create labels to categorize issues. For example, "component-xx" for
different components. We can also have labels for different spark versions.
4. Still require a JIRA ticket for all PRs (for now, let's discuss if we
can relax the restriction in a separate thread).
5. Build a github bot that can pick up a special label in github issues
("create-jira" for example), and duplicate that github issue in the JIRA
system. So for all github issues, if we believe it's valid and should be
worked on, all we need to do is to add a label to it, and the same JIRA
issue will be created for all our existing infra.

I know people have different opinions about how to avoid duplicating
information between JIRA/PR and how to improve merging experience. They are
all kind of relevant to how we use github as a platform, but let's find our
common ground and do it first. We'll have more stat once we move one step
forward.

Also if we decide to do this, I don't believe it requires an SPIP. It's a
procedural change (and a rather easy one). What should be the process to
land this?

Tian Gao

On Sun, Feb 1, 2026 at 5:41 PM vaquar khan <[email protected]> wrote:

> Hi all,
>
> I've been auditing our dev automation to see how the SPIP  impacts the
> release pipeline. We have some heavy JIRA coupling in
> *dev/merge_spark_pr.py *that will break our workflow if we don't handle
> the transition surgically.
>
> Right now, the script is locked into the jira python lib and explicitly
> calls asf_jira.project_versions("SPARK") to fetch unreleased versions. It
> uses specific logic to map branch names like branch-3.5 to JIRA versions
> and filters invalid combinations to keep history clean. This metadata is
> the source of truth for our release notes. If we cut over to GitHub
> milestones without a dual-write setup, we risk "metadata drift" where fixes
> merged to master won't show up in the 3.5.x JIRA reports, making our
> enterprise documentation incomplete.
>
> To address the concerns raised about fragmented releases, I’m proposing a
> "dual-rail" period where we refactor the merge script to update both
> trackers simultaneously. This protects the integrity of the 3.5/4.0
> maintenance split while we validate the new workflow.
>
> We also need to avoid the data corruption Maven hit during their
> migration. Their script hit a race condition with GitHub's async import API
> returning 404s for "pending" issues, which resulted in thousands of
> duplicate tickets and the accidental deletion of valid PRs during cleanup.
> I’m prototyping a state-aware import tool with a local SQLite cache to
> handle the idempotency so we don't corrupt our historical metadata.
>
> The upside is worth the effort—the Apache Arrow project reported a 39%
> jump in mailing list traffic and hit 2,297 commits in a single quarter
> (approaching their historical highs) after they moved to GitHub Issues. You
> can see the metrics in the official board minutes here:
> https://whimsy.apache.org/board/minutes/Arrow.html.
>
> If the community is interested, I'm happy to help build out this migration
> infra and the hybrid release note generator. I'll wait for feedback before
> I share any design docs on the GraphQL implementation.
>
> Regards,
> Viquar Khan
> https://www.linkedin.com/in/vaquar-khan-b695577/
>
> On Sun, 1 Feb 2026 at 18:45, Jungtaek Lim <[email protected]>
> wrote:
>
>> Maybe then we could also revisit the purpose of SPIP.
>>
>> https://spark.apache.org/improvement-proposals.html
>>
>> The purpose of an SPIP is to inform and involve the user community in
>>> major improvements to the Spark codebase throughout the development
>>> process, to increase the likelihood that user needs are met.
>>>
>>> SPIPs should be used for significant user-facing or cross-cutting
>>> changes, not small incremental improvements. When in doubt, if a committer
>>> thinks a change needs an SPIP, it does.
>>>
>> IMHO this proposal does not warrant SPIP according to the above as the
>> proposal is nothing to do with Spark "codebase". The first sentence
>> clarifies the purpose of SPIP to be scoped to code changes. It's just that
>> we somehow have been using SPIP for a broader purpose, which I doubt the
>> stricter voting rule considered the case beyond the purpose of SPIP. For me
>> it sounds like an unintentional side effect. Once this proposal no longer
>> couples with SPIP, this will go back to the procedural voting process and
>> veto does not apply.
>>
>> On Mon, Feb 2, 2026 at 6:03 AM Mark Hamstra <[email protected]>
>> wrote:
>>
>>> Ok, perhaps that page should read "...follows the Apache code-change
>>> vote process..." or "...follows the typical Apache code-change
>>> process...", because the clear intent is that the vote is more
>>> restrictive than a procedural vote and that the SPIP cannot pass with
>>> any -1 votes from PMC members: "(at least 3 +1 votes from PMC members
>>> and no -1 votes from PMC members)".
>>>
>>> On Sun, Feb 1, 2026 at 12:47 PM Jungtaek Lim
>>> <[email protected]> wrote:
>>> >
>>> > It says “typical” but the voting rule which enables veto is only for
>>> “code change”. “Typically”, non code change does not come “veto” into play.
>>> >
>>> > https://www.apache.org/foundation/voting.html
>>> >
>>> > So either SPIP overwrites the voting rule to stricter one, or SPIP
>>> does not aim to handle proposal for non code change. Specifically, this
>>> discussion is “procedural” one IMO.
>>> >
>>> > Does it make sense? Which one we were intending to?
>>> >
>>> > 2026년 2월 2일 (월) 오전 12:44, Mark Hamstra <[email protected]>님이 작성:
>>> >>
>>> >> The voting requirements are enumerated in
>>> >> https://spark.apache.org/improvement-proposals.html
>>> >> Is there something unclear there?
>>> >>
>>> >> On Fri, Jan 30, 2026 at 2:51 PM Jungtaek Lim
>>> >> <[email protected]> wrote:
>>> >> >
>>> >> > I'm not sure the voting process for SPIP is intentionally enabling
>>> the veto. It follows the voting process of "code change", but we have non
>>> code change being scoped as SPIP.
>>> >> >
>>> >> > To PMC members - is it intentional for any SPIP including non-code
>>> change to follow the voting process of "code change", or did we imply SPIP
>>> only applies to eventual code change and this topic doesn't warrant SPIP?
>>> >> >
>>> >> > On Sat, Jan 31, 2026 at 7:03 AM Tian Gao <[email protected]>
>>> wrote:
>>> >> >>
>>> >> >> A difference between what Dongjoon proposed and my proposal is -
>>> during this "test phase", is it allowed to submit PRs that are linked to
>>> github issues, instead of JIRA? If it's a yes, then I'm totally fine if we
>>> want to extend this to 3-6 months. If it's a no, I still believe it's a
>>> significant improvement, but we may miss some data about if people feel
>>> more comfortable using github vs JIRA.
>>> >> >>
>>> >> >> If we only allow github issues to be a discussion forum, then I
>>> don't think it deserves an SPIP - let's just open it.
>>> >> >>
>>> >> >> If we want to have both work at the same time (at least start
>>> building infra around github issues), we need to sort out some details -
>>> majorly the procedural differences. What information in JIRA tickets do we
>>> really need so we have to require an equivalent component in github issues.
>>> Can we still do release notes properly. Maybe enforce (highly encourage)
>>> committers to use JIRA during that phase so we still have all the major
>>> pieces the same?
>>> >> >>
>>> >> >> I believe PMC members have the right to veto any SPIP, so I want
>>> to find a common ground here to make some progress.
>>> >> >>
>>> >> >> Tian Gao
>>> >> >>
>>> >> >> On Fri, Jan 30, 2026 at 1:35 PM Jungtaek Lim <
>>> [email protected]> wrote:
>>> >> >>>
>>> >> >>> > Speaking as an occasional contributor, I would expect this to
>>> be more effort than it’s worth. A phased migration is appealing because it
>>> feels safer and more gradual, but I think everyone will be better off in
>>> the long run with a speedy and clear cutover from Jira to GitHub.
>>> >> >>>
>>> >> >>> My main proposal is to construct a way to prove the proposal by
>>> ourselves instead of just arguing around which one is better from
>>> experience with other projects', individual's preference of UI/UX, etc.
>>> Everyone talks from their experience and no one can be on behalf of
>>> artificial potential contributors and other existing contributors
>>> (including committers and PMC members). I'm not sure we don't have good
>>> evidence about changing it as a whole - if all of us had the same
>>> preference, this discussion thread should have just simply been filled with
>>> a wave of +1. That didn't happen. The first phase would give data, at least
>>> for how many issues will be filed from non-code-contributors, which we
>>> collect the accounts and consider these accounts to have been something we
>>> should have handled ASF account creation, or even aggressively, consider
>>> these issues to be non-existed if we didn't migrate.
>>> >> >>>
>>> >> >>> Also if you look at SPIP doc, the plan is already phased. It's
>>> just that 2 weeks is incredibly short for many PMC members, which has a
>>> high chance for them to work nothing about Apache Spark during the time,
>>> and they have a binding vote to make a decision. IMHO it should be much
>>> longer than that, a quarter or a half year.
>>> >> >>>
>>> >> >>> On Sat, Jan 31, 2026 at 3:27 AM Nicholas Chammas <
>>> [email protected]> wrote:
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> > On Jan 30, 2026, at 9:27 AM, Szehon Ho <
>>> [email protected]> wrote:
>>> >> >>>> >
>>> >> >>>> > In my experience, because I see few committers discussing
>>> anything technical on Spark JIRA for years as you mentioned (and other
>>> Hadoop project JIRAs too), I feel like nobody will reply if I do, so I will
>>> make a Github PR directly and ping for feedback there.  So in addition to
>>> the UX problem Tian mentioned, it's worsened by cause and effect.  So it's
>>> become a procedure, and we still don't have a good place to discuss without
>>> jumping to code.
>>> >> >>>>
>>> >> >>>> This has often been my experience as well. The eyes are mainly
>>> on GitHub and not Jira.
>>> >> >>>>
>>> >> >>>> > On Jan 30, 2026, at 12:12 AM, Jungtaek Lim <
>>> [email protected]> wrote:
>>> >> >>>> >
>>> >> >>>> > Why can't we do this in two phases instead of trying to build
>>> Rome in a day?
>>> >> >>>>
>>> >> >>>> Speaking as an occasional contributor, I would expect this to be
>>> more effort than it’s worth. A phased migration is appealing because it
>>> feels safer and more gradual, but I think everyone will be better off in
>>> the long run with a speedy and clear cutover from Jira to GitHub. The
>>> longer the transitional phase lasts, the more confusing it will be to new
>>> and occasional contributors who are not following the dev process’s
>>> evolution closely.
>>> >> >>>>
>>> >> >>>> Nick
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> ---------------------------------------------------------------------
>>> >> >>>> To unsubscribe e-mail: [email protected]
>>> >> >>>>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: [email protected]
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>
>
> --
> Regards,
> Vaquar Khan
>
>

Reply via email to