Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Samrat Deb Mon, 23 Mar 2026 21:51:34 -0700

Hi Martijn,

+1 for the initiative.


I really liked the Iceberg-style guidelines [1]. AI-generated code must
face the same strict review standards as human code. The author must take
full ownership, explain the "why" behind the logic, and be able to debug it.

One word of caution regarding Leonard's idea of a support agent for the
user@flink list or Slack. Let's tread very carefully here. The blast radius
for a hallucinated configuration, for example, mixing up
RocksDBStateBackend and HashMapStateBackend tuning. During a user's
production crisis, it is massive and could lead to data loss.
If we do build a support bot, it must be strictly constrained by our
official docs, maybe RAG-based initially and evolve from there and must
contain the right disclaimer.

Bests,
Samrat
[1] https://iceberg.apache.org/contribute/#how-are-proposals-adopted

On Mon, Mar 23, 2026 at 7:59 PM Ramin Gharib <[email protected]> wrote:

> Hi Martijn,
>
> +1 from me.
>
> Thanks for bringing this up. It makes total sense to get ahead of this and
> set some clear guardrails as these tools become more popular.
>
> I really like the AGENTS.md approach. Explicitly laying out module-level
> context will definitely help reduce the noise from AI-generated PRs.
>
> Happy to see this move forward!
>
> Cheers,
>
> Ramin
>
> On Mon, Mar 23, 2026 at 2:59 PM Gustavo de Morais <[email protected]>
> wrote:
>
> > Hi Martijn,
> >
> > Thanks for driving this and I'm +1 for the initiative so we share
> knowledge
> > across the community. I'm also +1 to starting with only the root
> AGENTS.md.
> > Correct and thoroughly reviewed AGENTS.md should be a follow-up for each
> > module. In my experience, a shorter and correct context file is better
> than
> > longer, incorrect/outdated files which create a bad experience using
> > agents.
> >
> >
> >
> >
> > I've done a review for the PR for the things I'm aware of. It'd be nice
> to
> > have other eyes from people with different expertises.
> >
> >  Kind regards,
> >
> >
> >
> >  Gustavo
> >
> >
> > On Mon, 23 Mar 2026 at 12:58, Martijn Visser <[email protected]>
> > wrote:
> >
> > > If there are no more comments, I'll start a vote later this week
> > >
> > > On Mon, Mar 16, 2026 at 1:22 PM Martijn Visser <
> [email protected]
> > >
> > > wrote:
> > >
> > > >  Hi all,
> > > >
> > > > Thanks for all the feedback and support. I've opened a draft PR [1]
> > that
> > > > covers points 1 and 2 from the original proposal.
> > > >
> > > > What's in the PR:
> > > >
> > > > 1. The PR includes an AGENTS.md at the repository root with
> > > prerequisites,
> > > > build/test commands, repository structure, architecture boundaries,
> > > common
> > > > change patterns, coding standards, testing standards, commit
> > conventions,
> > > > and boundaries. It also updates the PR template with a dedicated AI
> > > > disclosure section (checkbox + Generated-by tag).
> > > > 2. Module-level AGENTS.md files (point 3) are not (yet) included and
> > can
> > > > be added incrementally by module maintainers.
> > > >
> > > > I've used Claude to generate this PR, to show how these tools can
> also
> > > > help us with these things.
> > > >
> > > > Let me also respond to the individual points raised.
> > > >
> > > > @Leonard: Interesting idea about an AI agent for the users' mailing
> > list,
> > > > but I'd think it would also be great if we could integrate it in the
> > > Slack
> > > > workspace itself for those that are more active there. I think
> that's a
> > > > separate discussion worth having, but out of scope for this proposal.
> > > Would
> > > > you like to start a dedicated thread for that?
> > > >
> > > > @Zakelly: Good point about architecture, performance, and code
> > > > reusability. The AGENTS.md includes an "Architecture Boundaries"
> > section
> > > > and a "Common Change Patterns" section that maps change types to the
> > > > modules they affect, which should help steer AI agents in the right
> > > > direction. Regarding GitHub labels and bot reminders for AI-generated
> > > PRs:
> > > > I think that's a good idea but would be a separate follow-up. I think
> > we
> > > > should get the baseline guidelines in place first.
> > > >
> > > > @Vaquar: Thanks for sharing. I think AGENTS.md and the PR template
> > > > disclosure are the right starting point for Flink. Deterministic
> > > > build-system gates are an interesting idea, but I'd want to see how
> the
> > > > community's experience with AI contributions evolves before adding
> that
> > > > level of enforcement. If you'd like to propose something concrete for
> > > > Flink, a FLIP would be the right vehicle for that.
> > > >
> > > > Process question:
> > > >
> > > > Since these are contribution guidelines rather than API or
> architecture
> > > > changes, I think a vote on this thread would be sufficient. But if
> the
> > > > community feels this warrants a formal FLIP, I'm happy to go that
> > route.
> > > > What do others think?
> > > >
> > > > Feedback on the PR is welcome.
> > > >
> > > > Thanks, Martijn
> > > >
> > > > [1] https://github.com/apache/flink/pull/27776
> > > >
> > > > On Sat, Mar 14, 2026 at 6:13 AM vaquar khan <[email protected]>
> > > > wrote:
> > > >
> > > >> Hi Martijn, Zakelly, and everyone,
> > > >>
> > > >> +1 to adding AGENTS.md. It's a great first step
> > > >> as all other Apache projects follow the same approach.
> > > >>
> > > >> I saw this thread and thought I'd chime in because I'm actually
> > working
> > > on
> > > >> a draft KIP proposal  on this exact topic right now.
> > > >>
> > > >> To Zakelly's point about AI falling short on architecture: AGENTS.md
> > is
> > > a
> > > >> great guide, but it’s ultimately a "soft control." In my experience,
> > > LLMs
> > > >> probabilistically ignore markdown instructions when their context
> > > windows
> > > >> fill up or prompts drift.
> > > >>
> > > >> To really stop the review fatigue, my KIP draft proposes adding a
> > > >> deterministic "hard control" hooked directly into the build system.
> It
> > > >> uses
> > > >> local AST parsing to automatically block PRs that are mostly empty
> > > >> scaffolding/docstrings (low logic density) or violate core
> > architectural
> > > >> patterns. It catches the "AI slop" before a human ever has to look
> at
> > > it.
> > > >>
> > > >> If the community is interested, I’d be happy to share my draft KIP.
> It
> > > >> might be a helpful reference if we want to explore a similar
> > Maven-based
> > > >> gate for Flink.
> > > >>
> > > >> Regards,
> > > >>
> > > >> Vaquar Khan
> > > >>
> > > >> On Thu, Mar 12, 2026 at 9:57 PM Zakelly Lan <[email protected]>
> > > >> wrote:
> > > >>
> > > >> > Hi, Martjin,
> > > >> >
> > > >> > Thanks for bringing this up. I'd +1 on this proposal.
> > > >> >
> > > >> > In the guidelines, I'd like to emphasize that contributors and
> > > reviewers
> > > >> > should pay particular attention to architecture, performance, and
> > code
> > > >> > reusability. Based on my experience working with AI, code agents
> > often
> > > >> fall
> > > >> > short in these.
> > > >> >
> > > >> > And furthermore, I suggest we introduce mechanisms to ensure a
> > smooth
> > > >> > review process for AI-generated code, such as adding github labels
> > > and a
> > > >> > special reminder for reviewers from the flink's github bot.
> > > >> >
> > > >> >
> > > >> > Best,
> > > >> > Zakelly
> > > >> >
> > > >> >
> > > >> > On Fri, Mar 13, 2026 at 10:09 AM Rion Williams <
> > [email protected]
> > > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Martijn,
> > > >> > >
> > > >> > > I think this is a great idea and definitely an effort worth
> > > pursuing —
> > > >> > > it’s actually something I’ve been considering experimenting with
> > > >> myself.
> > > >> > A
> > > >> > > clear +1 from me, and I’d be happy to help as the effort
> develops.
> > > >> > >
> > > >> > > On the reviewer side, we already have a pretty solid set of
> > > guardrails
> > > >> > and
> > > >> > > review processes in place, which is great. That said, it’s still
> > > easy
> > > >> to
> > > >> > > become inundated by a large, random PR with little or no context
> > > >> > (sometimes
> > > >> > > clearly AI-driven). Establishing some guidelines specifically
> > around
> > > >> AI
> > > >> > > usage — both for providing development context and for helping
> > with
> > > >> the
> > > >> > > review/audit process — would be fantastic, even if we start
> small
> > > and
> > > >> > > gradually evolve things over time.
> > > >> > >
> > > >> > > Thanks for kicking this off. Looking forward to hearing what
> > others
> > > >> > think.
> > > >> > >
> > > >> > > Cheers,
> > > >> > >
> > > >> > > Rion
> > > >> > >
> > > >> > >
> > > >> > > > On Mar 12, 2026, at 8:50 PM, Leonard Xu <[email protected]>
> > > wrote:
> > > >> > > >
> > > >> > > > Hi Martijn,
> > > >> > > >
> > > >> > > > Thanks for kicking off this discussion. I've been thinking
> along
> > > >> > similar
> > > >> > > lines recently, so you have a +1 from me on this proposal.
> > > >> > > >
> > > >> > > > I also have a suggestion regarding activity on the users'
> > mailing
> > > >> list.
> > > >> > > Could we consider introducing an AI agent to help answer users'
> > > >> > questions?
> > > >> > > I've noticed that many inquiries on user@flink currently go
> > > >> unanswered,
> > > >> > > yet most of them could be effectively addressed by an agent.
> > > >> > > >
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Leonard
> > > >> > > >
> > > >> > > >> 2026 3月 13 05:03，Martijn Visser <[email protected]>
> 写道：
> > > >> > > >>
> > > >> > > >> Hi all,
> > > >> > > >>
> > > >> > > >> I'd like to start a discussion about how the Flink community
> > > should
> > > >> > > handle
> > > >> > > >> AI-assisted contributions and how we can make the Flink
> > codebase
> > > >> more
> > > >> > > >> accessible to AI tooling.
> > > >> > > >>
> > > >> > > >> The ASF has published guidance on generative AI tooling [1],
> > and
> > > >> > several
> > > >> > > >> Apache projects have already adopted project-specific
> > guidelines
> > > on
> > > >> > top
> > > >> > > of
> > > >> > > >> that. I think Flink should too.
> > > >> > > >>
> > > >> > > >> The most comprehensive example I've seen is Apache Airflow.
> > > They've
> > > >> > > added
> > > >> > > >> an AGENTS.md [2] with instructions for AI coding agents,
> > > including
> > > >> PR
> > > >> > > >> templates with an AI disclosure checkbox, a self-review
> > > checklist,
> > > >> and
> > > >> > > the
> > > >> > > >> Generated-by: commit message token that the ASF guidance
> > > >> recommends.
> > > >> > > Apache
> > > >> > > >> Iceberg recently adopted AI contribution guidelines [3]
> focused
> > > on
> > > >> > > >> contributor accountability: you must be able to debug,
> explain,
> > > and
> > > >> > own
> > > >> > > the
> > > >> > > >> changes. Other projects like Paimon [4], Mahout [5], and
> Ozone
> > > [6]
> > > >> > have
> > > >> > > >> adopted similar policies.
> > > >> > > >>
> > > >> > > >> I'd like to propose the following for Flink:
> > > >> > > >>
> > > >> > > >> 1. Adopt contribution guidelines for AI-assisted PRs.
> > > Contributors
> > > >> > must
> > > >> > > >> disclose when AI tooling was used (using Generated-by: <Tool
> > Name
> > > >> and
> > > >> > > >> Version> in the commit message), and must be able to explain
> > and
> > > >> take
> > > >> > > >> ownership of all changes. AI-generated code is held to the
> same
> > > >> review
> > > >> > > >> standards as human-written code.
> > > >> > > >> 2. Add AGENTS.md files to the Flink repository. AGENTS.md [7]
> > is
> > > a
> > > >> > > >> convention for giving AI coding agents project-specific
> > context.
> > > It
> > > >> > can
> > > >> > > >> contain information like build instructions, test commands,
> > > coding
> > > >> > > >> conventions, commit message format. I think we should add one
> > at
> > > >> the
> > > >> > > root
> > > >> > > >> of apache/flink.
> > > >> > > >> 3. Add module-level context for AI tooling. This is where I
> > think
> > > >> we
> > > >> > can
> > > >> > > >> take a step forward. Each Flink module (e.g.
> > > flink-streaming-java,
> > > >> > > >> flink-table-planner, flink-clients) would benefit from its
> own
> > > >> > AGENTS.md
> > > >> > > >> explaining the module's role, key abstractions, testing
> > patterns,
> > > >> and
> > > >> > > >> common pitfalls. This also serves as architectural
> > documentation
> > > >> that
> > > >> > > helps
> > > >> > > >> human contributors.
> > > >> > > >>
> > > >> > > >> I'm looking forward to hearing what others think about this.
> > > >> > > >>
> > > >> > > >> Best regards,
> > > >> > > >>
> > > >> > > >> Martijn
> > > >> > > >>
> > > >> > > >> [1] https://www.apache.org/legal/generative-tooling.html
> > > >> > > >> [2] https://github.com/apache/airflow/blob/main/AGENTS.md
> > > >> > > >> [3]
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
> > > >> > > >> [4]
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/paimon/blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1#L22
> > > >> > > >> [5]
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/mahout/blob/main/docs/community/pr-policy-and-review-guidelines.md
> > > >> > > >> [6]
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/ozone-site/blob/master/src/pages/release-notes/2.0.0.md?plain=1#L408
> > > >> > > >> [7] https://agents.md/
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Reply via email to