Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Martijn Visser Mon, 23 Mar 2026 04:58:15 -0700

If there are no more comments, I'll start a vote later this week

On Mon, Mar 16, 2026 at 1:22 PM Martijn Visser <[email protected]>
wrote:


>  Hi all,
>
> Thanks for all the feedback and support. I've opened a draft PR [1] that
> covers points 1 and 2 from the original proposal.
>
> What's in the PR:
>
> 1. The PR includes an AGENTS.md at the repository root with prerequisites,
> build/test commands, repository structure, architecture boundaries, common
> change patterns, coding standards, testing standards, commit conventions,
> and boundaries. It also updates the PR template with a dedicated AI
> disclosure section (checkbox + Generated-by tag).
> 2. Module-level AGENTS.md files (point 3) are not (yet) included and can
> be added incrementally by module maintainers.
>
> I've used Claude to generate this PR, to show how these tools can also
> help us with these things.
>
> Let me also respond to the individual points raised.
>
> @Leonard: Interesting idea about an AI agent for the users' mailing list,
> but I'd think it would also be great if we could integrate it in the Slack
> workspace itself for those that are more active there. I think that's a
> separate discussion worth having, but out of scope for this proposal. Would
> you like to start a dedicated thread for that?
>
> @Zakelly: Good point about architecture, performance, and code
> reusability. The AGENTS.md includes an "Architecture Boundaries" section
> and a "Common Change Patterns" section that maps change types to the
> modules they affect, which should help steer AI agents in the right
> direction. Regarding GitHub labels and bot reminders for AI-generated PRs:
> I think that's a good idea but would be a separate follow-up. I think we
> should get the baseline guidelines in place first.
>
> @Vaquar: Thanks for sharing. I think AGENTS.md and the PR template
> disclosure are the right starting point for Flink. Deterministic
> build-system gates are an interesting idea, but I'd want to see how the
> community's experience with AI contributions evolves before adding that
> level of enforcement. If you'd like to propose something concrete for
> Flink, a FLIP would be the right vehicle for that.
>
> Process question:
>
> Since these are contribution guidelines rather than API or architecture
> changes, I think a vote on this thread would be sufficient. But if the
> community feels this warrants a formal FLIP, I'm happy to go that route.
> What do others think?
>
> Feedback on the PR is welcome.
>
> Thanks, Martijn
>
> [1] https://github.com/apache/flink/pull/27776
>
> On Sat, Mar 14, 2026 at 6:13 AM vaquar khan <[email protected]>
> wrote:
>
>> Hi Martijn, Zakelly, and everyone,
>>
>> +1 to adding AGENTS.md. It's a great first step
>> as all other Apache projects follow the same approach.
>>
>> I saw this thread and thought I'd chime in because I'm actually working on
>> a draft KIP proposal  on this exact topic right now.
>>
>> To Zakelly's point about AI falling short on architecture: AGENTS.md is a
>> great guide, but it’s ultimately a "soft control." In my experience, LLMs
>> probabilistically ignore markdown instructions when their context windows
>> fill up or prompts drift.
>>
>> To really stop the review fatigue, my KIP draft proposes adding a
>> deterministic "hard control" hooked directly into the build system. It
>> uses
>> local AST parsing to automatically block PRs that are mostly empty
>> scaffolding/docstrings (low logic density) or violate core architectural
>> patterns. It catches the "AI slop" before a human ever has to look at it.
>>
>> If the community is interested, I’d be happy to share my draft KIP. It
>> might be a helpful reference if we want to explore a similar Maven-based
>> gate for Flink.
>>
>> Regards,
>>
>> Vaquar Khan
>>
>> On Thu, Mar 12, 2026 at 9:57 PM Zakelly Lan <[email protected]>
>> wrote:
>>
>> > Hi, Martjin,
>> >
>> > Thanks for bringing this up. I'd +1 on this proposal.
>> >
>> > In the guidelines, I'd like to emphasize that contributors and reviewers
>> > should pay particular attention to architecture, performance, and code
>> > reusability. Based on my experience working with AI, code agents often
>> fall
>> > short in these.
>> >
>> > And furthermore, I suggest we introduce mechanisms to ensure a smooth
>> > review process for AI-generated code, such as adding github labels and a
>> > special reminder for reviewers from the flink's github bot.
>> >
>> >
>> > Best,
>> > Zakelly
>> >
>> >
>> > On Fri, Mar 13, 2026 at 10:09 AM Rion Williams <[email protected]>
>> > wrote:
>> >
>> > > Hi Martijn,
>> > >
>> > > I think this is a great idea and definitely an effort worth pursuing —
>> > > it’s actually something I’ve been considering experimenting with
>> myself.
>> > A
>> > > clear +1 from me, and I’d be happy to help as the effort develops.
>> > >
>> > > On the reviewer side, we already have a pretty solid set of guardrails
>> > and
>> > > review processes in place, which is great. That said, it’s still easy
>> to
>> > > become inundated by a large, random PR with little or no context
>> > (sometimes
>> > > clearly AI-driven). Establishing some guidelines specifically around
>> AI
>> > > usage — both for providing development context and for helping with
>> the
>> > > review/audit process — would be fantastic, even if we start small and
>> > > gradually evolve things over time.
>> > >
>> > > Thanks for kicking this off. Looking forward to hearing what others
>> > think.
>> > >
>> > > Cheers,
>> > >
>> > > Rion
>> > >
>> > >
>> > > > On Mar 12, 2026, at 8:50 PM, Leonard Xu <[email protected]> wrote:
>> > > >
>> > > > Hi Martijn,
>> > > >
>> > > > Thanks for kicking off this discussion. I've been thinking along
>> > similar
>> > > lines recently, so you have a +1 from me on this proposal.
>> > > >
>> > > > I also have a suggestion regarding activity on the users' mailing
>> list.
>> > > Could we consider introducing an AI agent to help answer users'
>> > questions?
>> > > I've noticed that many inquiries on user@flink currently go
>> unanswered,
>> > > yet most of them could be effectively addressed by an agent.
>> > > >
>> > > >
>> > > > Best,
>> > > > Leonard
>> > > >
>> > > >> 2026 3月 13 05:03，Martijn Visser <[email protected]> 写道：
>> > > >>
>> > > >> Hi all,
>> > > >>
>> > > >> I'd like to start a discussion about how the Flink community should
>> > > handle
>> > > >> AI-assisted contributions and how we can make the Flink codebase
>> more
>> > > >> accessible to AI tooling.
>> > > >>
>> > > >> The ASF has published guidance on generative AI tooling [1], and
>> > several
>> > > >> Apache projects have already adopted project-specific guidelines on
>> > top
>> > > of
>> > > >> that. I think Flink should too.
>> > > >>
>> > > >> The most comprehensive example I've seen is Apache Airflow. They've
>> > > added
>> > > >> an AGENTS.md [2] with instructions for AI coding agents, including
>> PR
>> > > >> templates with an AI disclosure checkbox, a self-review checklist,
>> and
>> > > the
>> > > >> Generated-by: commit message token that the ASF guidance
>> recommends.
>> > > Apache
>> > > >> Iceberg recently adopted AI contribution guidelines [3] focused on
>> > > >> contributor accountability: you must be able to debug, explain, and
>> > own
>> > > the
>> > > >> changes. Other projects like Paimon [4], Mahout [5], and Ozone [6]
>> > have
>> > > >> adopted similar policies.
>> > > >>
>> > > >> I'd like to propose the following for Flink:
>> > > >>
>> > > >> 1. Adopt contribution guidelines for AI-assisted PRs. Contributors
>> > must
>> > > >> disclose when AI tooling was used (using Generated-by: <Tool Name
>> and
>> > > >> Version> in the commit message), and must be able to explain and
>> take
>> > > >> ownership of all changes. AI-generated code is held to the same
>> review
>> > > >> standards as human-written code.
>> > > >> 2. Add AGENTS.md files to the Flink repository. AGENTS.md [7] is a
>> > > >> convention for giving AI coding agents project-specific context. It
>> > can
>> > > >> contain information like build instructions, test commands, coding
>> > > >> conventions, commit message format. I think we should add one at
>> the
>> > > root
>> > > >> of apache/flink.
>> > > >> 3. Add module-level context for AI tooling. This is where I think
>> we
>> > can
>> > > >> take a step forward. Each Flink module (e.g. flink-streaming-java,
>> > > >> flink-table-planner, flink-clients) would benefit from its own
>> > AGENTS.md
>> > > >> explaining the module's role, key abstractions, testing patterns,
>> and
>> > > >> common pitfalls. This also serves as architectural documentation
>> that
>> > > helps
>> > > >> human contributors.
>> > > >>
>> > > >> I'm looking forward to hearing what others think about this.
>> > > >>
>> > > >> Best regards,
>> > > >>
>> > > >> Martijn
>> > > >>
>> > > >> [1] https://www.apache.org/legal/generative-tooling.html
>> > > >> [2] https://github.com/apache/airflow/blob/main/AGENTS.md
>> > > >> [3]
>> > > >>
>> > >
>> >
>> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
>> > > >> [4]
>> > > >>
>> > >
>> >
>> https://github.com/apache/paimon/blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1#L22
>> > > >> [5]
>> > > >>
>> > >
>> >
>> https://github.com/apache/mahout/blob/main/docs/community/pr-policy-and-review-guidelines.md
>> > > >> [6]
>> > > >>
>> > >
>> >
>> https://github.com/apache/ozone-site/blob/master/src/pages/release-notes/2.0.0.md?plain=1#L408
>> > > >> [7] https://agents.md/
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Reply via email to