Hi all,

I've been auditing our dev automation to see how the SPIP  impacts the
release pipeline. We have some heavy JIRA coupling in
*dev/merge_spark_pr.py *that will break our workflow if we don't handle the
transition surgically.

Right now, the script is locked into the jira python lib and explicitly
calls asf_jira.project_versions("SPARK") to fetch unreleased versions. It
uses specific logic to map branch names like branch-3.5 to JIRA versions
and filters invalid combinations to keep history clean. This metadata is
the source of truth for our release notes. If we cut over to GitHub
milestones without a dual-write setup, we risk "metadata drift" where fixes
merged to master won't show up in the 3.5.x JIRA reports, making our
enterprise documentation incomplete.

To address the concerns raised about fragmented releases, I’m proposing a
"dual-rail" period where we refactor the merge script to update both
trackers simultaneously. This protects the integrity of the 3.5/4.0
maintenance split while we validate the new workflow.

We also need to avoid the data corruption Maven hit during their migration.
Their script hit a race condition with GitHub's async import API returning
404s for "pending" issues, which resulted in thousands of duplicate tickets
and the accidental deletion of valid PRs during cleanup. I’m prototyping a
state-aware import tool with a local SQLite cache to handle the idempotency
so we don't corrupt our historical metadata.

The upside is worth the effort—the Apache Arrow project reported a 39% jump
in mailing list traffic and hit 2,297 commits in a single quarter
(approaching their historical highs) after they moved to GitHub Issues. You
can see the metrics in the official board minutes here:
https://whimsy.apache.org/board/minutes/Arrow.html.

If the community is interested, I'm happy to help build out this migration
infra and the hybrid release note generator. I'll wait for feedback before
I share any design docs on the GraphQL implementation.

Regards,
Viquar Khan
https://www.linkedin.com/in/vaquar-khan-b695577/

On Sun, 1 Feb 2026 at 18:45, Jungtaek Lim <[email protected]>
wrote:

> Maybe then we could also revisit the purpose of SPIP.
>
> https://spark.apache.org/improvement-proposals.html
>
> The purpose of an SPIP is to inform and involve the user community in
>> major improvements to the Spark codebase throughout the development
>> process, to increase the likelihood that user needs are met.
>>
>> SPIPs should be used for significant user-facing or cross-cutting
>> changes, not small incremental improvements. When in doubt, if a committer
>> thinks a change needs an SPIP, it does.
>>
> IMHO this proposal does not warrant SPIP according to the above as the
> proposal is nothing to do with Spark "codebase". The first sentence
> clarifies the purpose of SPIP to be scoped to code changes. It's just that
> we somehow have been using SPIP for a broader purpose, which I doubt the
> stricter voting rule considered the case beyond the purpose of SPIP. For me
> it sounds like an unintentional side effect. Once this proposal no longer
> couples with SPIP, this will go back to the procedural voting process and
> veto does not apply.
>
> On Mon, Feb 2, 2026 at 6:03 AM Mark Hamstra <[email protected]> wrote:
>
>> Ok, perhaps that page should read "...follows the Apache code-change
>> vote process..." or "...follows the typical Apache code-change
>> process...", because the clear intent is that the vote is more
>> restrictive than a procedural vote and that the SPIP cannot pass with
>> any -1 votes from PMC members: "(at least 3 +1 votes from PMC members
>> and no -1 votes from PMC members)".
>>
>> On Sun, Feb 1, 2026 at 12:47 PM Jungtaek Lim
>> <[email protected]> wrote:
>> >
>> > It says “typical” but the voting rule which enables veto is only for
>> “code change”. “Typically”, non code change does not come “veto” into play.
>> >
>> > https://www.apache.org/foundation/voting.html
>> >
>> > So either SPIP overwrites the voting rule to stricter one, or SPIP does
>> not aim to handle proposal for non code change. Specifically, this
>> discussion is “procedural” one IMO.
>> >
>> > Does it make sense? Which one we were intending to?
>> >
>> > 2026년 2월 2일 (월) 오전 12:44, Mark Hamstra <[email protected]>님이 작성:
>> >>
>> >> The voting requirements are enumerated in
>> >> https://spark.apache.org/improvement-proposals.html
>> >> Is there something unclear there?
>> >>
>> >> On Fri, Jan 30, 2026 at 2:51 PM Jungtaek Lim
>> >> <[email protected]> wrote:
>> >> >
>> >> > I'm not sure the voting process for SPIP is intentionally enabling
>> the veto. It follows the voting process of "code change", but we have non
>> code change being scoped as SPIP.
>> >> >
>> >> > To PMC members - is it intentional for any SPIP including non-code
>> change to follow the voting process of "code change", or did we imply SPIP
>> only applies to eventual code change and this topic doesn't warrant SPIP?
>> >> >
>> >> > On Sat, Jan 31, 2026 at 7:03 AM Tian Gao <[email protected]>
>> wrote:
>> >> >>
>> >> >> A difference between what Dongjoon proposed and my proposal is -
>> during this "test phase", is it allowed to submit PRs that are linked to
>> github issues, instead of JIRA? If it's a yes, then I'm totally fine if we
>> want to extend this to 3-6 months. If it's a no, I still believe it's a
>> significant improvement, but we may miss some data about if people feel
>> more comfortable using github vs JIRA.
>> >> >>
>> >> >> If we only allow github issues to be a discussion forum, then I
>> don't think it deserves an SPIP - let's just open it.
>> >> >>
>> >> >> If we want to have both work at the same time (at least start
>> building infra around github issues), we need to sort out some details -
>> majorly the procedural differences. What information in JIRA tickets do we
>> really need so we have to require an equivalent component in github issues.
>> Can we still do release notes properly. Maybe enforce (highly encourage)
>> committers to use JIRA during that phase so we still have all the major
>> pieces the same?
>> >> >>
>> >> >> I believe PMC members have the right to veto any SPIP, so I want to
>> find a common ground here to make some progress.
>> >> >>
>> >> >> Tian Gao
>> >> >>
>> >> >> On Fri, Jan 30, 2026 at 1:35 PM Jungtaek Lim <
>> [email protected]> wrote:
>> >> >>>
>> >> >>> > Speaking as an occasional contributor, I would expect this to be
>> more effort than it’s worth. A phased migration is appealing because it
>> feels safer and more gradual, but I think everyone will be better off in
>> the long run with a speedy and clear cutover from Jira to GitHub.
>> >> >>>
>> >> >>> My main proposal is to construct a way to prove the proposal by
>> ourselves instead of just arguing around which one is better from
>> experience with other projects', individual's preference of UI/UX, etc.
>> Everyone talks from their experience and no one can be on behalf of
>> artificial potential contributors and other existing contributors
>> (including committers and PMC members). I'm not sure we don't have good
>> evidence about changing it as a whole - if all of us had the same
>> preference, this discussion thread should have just simply been filled with
>> a wave of +1. That didn't happen. The first phase would give data, at least
>> for how many issues will be filed from non-code-contributors, which we
>> collect the accounts and consider these accounts to have been something we
>> should have handled ASF account creation, or even aggressively, consider
>> these issues to be non-existed if we didn't migrate.
>> >> >>>
>> >> >>> Also if you look at SPIP doc, the plan is already phased. It's
>> just that 2 weeks is incredibly short for many PMC members, which has a
>> high chance for them to work nothing about Apache Spark during the time,
>> and they have a binding vote to make a decision. IMHO it should be much
>> longer than that, a quarter or a half year.
>> >> >>>
>> >> >>> On Sat, Jan 31, 2026 at 3:27 AM Nicholas Chammas <
>> [email protected]> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> > On Jan 30, 2026, at 9:27 AM, Szehon Ho <[email protected]>
>> wrote:
>> >> >>>> >
>> >> >>>> > In my experience, because I see few committers discussing
>> anything technical on Spark JIRA for years as you mentioned (and other
>> Hadoop project JIRAs too), I feel like nobody will reply if I do, so I will
>> make a Github PR directly and ping for feedback there.  So in addition to
>> the UX problem Tian mentioned, it's worsened by cause and effect.  So it's
>> become a procedure, and we still don't have a good place to discuss without
>> jumping to code.
>> >> >>>>
>> >> >>>> This has often been my experience as well. The eyes are mainly on
>> GitHub and not Jira.
>> >> >>>>
>> >> >>>> > On Jan 30, 2026, at 12:12 AM, Jungtaek Lim <
>> [email protected]> wrote:
>> >> >>>> >
>> >> >>>> > Why can't we do this in two phases instead of trying to build
>> Rome in a day?
>> >> >>>>
>> >> >>>> Speaking as an occasional contributor, I would expect this to be
>> more effort than it’s worth. A phased migration is appealing because it
>> feels safer and more gradual, but I think everyone will be better off in
>> the long run with a speedy and clear cutover from Jira to GitHub. The
>> longer the transitional phase lasts, the more confusing it will be to new
>> and occasional contributors who are not following the dev process’s
>> evolution closely.
>> >> >>>>
>> >> >>>> Nick
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> ---------------------------------------------------------------------
>> >> >>>> To unsubscribe e-mail: [email protected]
>> >> >>>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: [email protected]
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>

-- 
Regards,
Vaquar Khan

Reply via email to