Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Martijn Visser Thu, 16 Apr 2026 08:29:33 -0700

Hi all,

I've opened up https://issues.apache.org/jira/browse/FLINK-39477 given that
there's consensus on getting this in, thank you all for your feedback!


Best regards,

Martijn

On Tue, Mar 24, 2026 at 5:52 AM Samrat Deb <[email protected]> wrote:

> Hi Martijn,
>
> +1 for the initiative.
>
> I really liked the Iceberg-style guidelines [1]. AI-generated code must
> face the same strict review standards as human code. The author must take
> full ownership, explain the "why" behind the logic, and be able to debug
> it.
>
> One word of caution regarding Leonard's idea of a support agent for the
> user@flink list or Slack. Let's tread very carefully here. The blast
> radius
> for a hallucinated configuration, for example, mixing up
> RocksDBStateBackend and HashMapStateBackend tuning. During a user's
> production crisis, it is massive and could lead to data loss.
> If we do build a support bot, it must be strictly constrained by our
> official docs, maybe RAG-based initially and evolve from there and must
> contain the right disclaimer.
>
> Bests,
> Samrat
> [1] https://iceberg.apache.org/contribute/#how-are-proposals-adopted
>
> On Mon, Mar 23, 2026 at 7:59 PM Ramin Gharib <[email protected]>
> wrote:
>
> > Hi Martijn,
> >
> > +1 from me.
> >
> > Thanks for bringing this up. It makes total sense to get ahead of this
> and
> > set some clear guardrails as these tools become more popular.
> >
> > I really like the AGENTS.md approach. Explicitly laying out module-level
> > context will definitely help reduce the noise from AI-generated PRs.
> >
> > Happy to see this move forward!
> >
> > Cheers,
> >
> > Ramin
> >
> > On Mon, Mar 23, 2026 at 2:59 PM Gustavo de Morais <
> [email protected]>
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > Thanks for driving this and I'm +1 for the initiative so we share
> > knowledge
> > > across the community. I'm also +1 to starting with only the root
> > AGENTS.md.
> > > Correct and thoroughly reviewed AGENTS.md should be a follow-up for
> each
> > > module. In my experience, a shorter and correct context file is better
> > than
> > > longer, incorrect/outdated files which create a bad experience using
> > > agents.
> > >
> > >
> > >
> > >
> > > I've done a review for the PR for the things I'm aware of. It'd be nice
> > to
> > > have other eyes from people with different expertises.
> > >
> > >  Kind regards,
> > >
> > >
> > >
> > >  Gustavo
> > >
> > >
> > > On Mon, 23 Mar 2026 at 12:58, Martijn Visser <[email protected]
> >
> > > wrote:
> > >
> > > > If there are no more comments, I'll start a vote later this week
> > > >
> > > > On Mon, Mar 16, 2026 at 1:22 PM Martijn Visser <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > >  Hi all,
> > > > >
> > > > > Thanks for all the feedback and support. I've opened a draft PR [1]
> > > that
> > > > > covers points 1 and 2 from the original proposal.
> > > > >
> > > > > What's in the PR:
> > > > >
> > > > > 1. The PR includes an AGENTS.md at the repository root with
> > > > prerequisites,
> > > > > build/test commands, repository structure, architecture boundaries,
> > > > common
> > > > > change patterns, coding standards, testing standards, commit
> > > conventions,
> > > > > and boundaries. It also updates the PR template with a dedicated AI
> > > > > disclosure section (checkbox + Generated-by tag).
> > > > > 2. Module-level AGENTS.md files (point 3) are not (yet) included
> and
> > > can
> > > > > be added incrementally by module maintainers.
> > > > >
> > > > > I've used Claude to generate this PR, to show how these tools can
> > also
> > > > > help us with these things.
> > > > >
> > > > > Let me also respond to the individual points raised.
> > > > >
> > > > > @Leonard: Interesting idea about an AI agent for the users' mailing
> > > list,
> > > > > but I'd think it would also be great if we could integrate it in
> the
> > > > Slack
> > > > > workspace itself for those that are more active there. I think
> > that's a
> > > > > separate discussion worth having, but out of scope for this
> proposal.
> > > > Would
> > > > > you like to start a dedicated thread for that?
> > > > >
> > > > > @Zakelly: Good point about architecture, performance, and code
> > > > > reusability. The AGENTS.md includes an "Architecture Boundaries"
> > > section
> > > > > and a "Common Change Patterns" section that maps change types to
> the
> > > > > modules they affect, which should help steer AI agents in the right
> > > > > direction. Regarding GitHub labels and bot reminders for
> AI-generated
> > > > PRs:
> > > > > I think that's a good idea but would be a separate follow-up. I
> think
> > > we
> > > > > should get the baseline guidelines in place first.
> > > > >
> > > > > @Vaquar: Thanks for sharing. I think AGENTS.md and the PR template
> > > > > disclosure are the right starting point for Flink. Deterministic
> > > > > build-system gates are an interesting idea, but I'd want to see how
> > the
> > > > > community's experience with AI contributions evolves before adding
> > that
> > > > > level of enforcement. If you'd like to propose something concrete
> for
> > > > > Flink, a FLIP would be the right vehicle for that.
> > > > >
> > > > > Process question:
> > > > >
> > > > > Since these are contribution guidelines rather than API or
> > architecture
> > > > > changes, I think a vote on this thread would be sufficient. But if
> > the
> > > > > community feels this warrants a formal FLIP, I'm happy to go that
> > > route.
> > > > > What do others think?
> > > > >
> > > > > Feedback on the PR is welcome.
> > > > >
> > > > > Thanks, Martijn
> > > > >
> > > > > [1] https://github.com/apache/flink/pull/27776
> > > > >
> > > > > On Sat, Mar 14, 2026 at 6:13 AM vaquar khan <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > >> Hi Martijn, Zakelly, and everyone,
> > > > >>
> > > > >> +1 to adding AGENTS.md. It's a great first step
> > > > >> as all other Apache projects follow the same approach.
> > > > >>
> > > > >> I saw this thread and thought I'd chime in because I'm actually
> > > working
> > > > on
> > > > >> a draft KIP proposal  on this exact topic right now.
> > > > >>
> > > > >> To Zakelly's point about AI falling short on architecture:
> AGENTS.md
> > > is
> > > > a
> > > > >> great guide, but it’s ultimately a "soft control." In my
> experience,
> > > > LLMs
> > > > >> probabilistically ignore markdown instructions when their context
> > > > windows
> > > > >> fill up or prompts drift.
> > > > >>
> > > > >> To really stop the review fatigue, my KIP draft proposes adding a
> > > > >> deterministic "hard control" hooked directly into the build
> system.
> > It
> > > > >> uses
> > > > >> local AST parsing to automatically block PRs that are mostly empty
> > > > >> scaffolding/docstrings (low logic density) or violate core
> > > architectural
> > > > >> patterns. It catches the "AI slop" before a human ever has to look
> > at
> > > > it.
> > > > >>
> > > > >> If the community is interested, I’d be happy to share my draft
> KIP.
> > It
> > > > >> might be a helpful reference if we want to explore a similar
> > > Maven-based
> > > > >> gate for Flink.
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Vaquar Khan
> > > > >>
> > > > >> On Thu, Mar 12, 2026 at 9:57 PM Zakelly Lan <
> [email protected]>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi, Martjin,
> > > > >> >
> > > > >> > Thanks for bringing this up. I'd +1 on this proposal.
> > > > >> >
> > > > >> > In the guidelines, I'd like to emphasize that contributors and
> > > > reviewers
> > > > >> > should pay particular attention to architecture, performance,
> and
> > > code
> > > > >> > reusability. Based on my experience working with AI, code agents
> > > often
> > > > >> fall
> > > > >> > short in these.
> > > > >> >
> > > > >> > And furthermore, I suggest we introduce mechanisms to ensure a
> > > smooth
> > > > >> > review process for AI-generated code, such as adding github
> labels
> > > > and a
> > > > >> > special reminder for reviewers from the flink's github bot.
> > > > >> >
> > > > >> >
> > > > >> > Best,
> > > > >> > Zakelly
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Mar 13, 2026 at 10:09 AM Rion Williams <
> > > [email protected]
> > > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Martijn,
> > > > >> > >
> > > > >> > > I think this is a great idea and definitely an effort worth
> > > > pursuing —
> > > > >> > > it’s actually something I’ve been considering experimenting
> with
> > > > >> myself.
> > > > >> > A
> > > > >> > > clear +1 from me, and I’d be happy to help as the effort
> > develops.
> > > > >> > >
> > > > >> > > On the reviewer side, we already have a pretty solid set of
> > > > guardrails
> > > > >> > and
> > > > >> > > review processes in place, which is great. That said, it’s
> still
> > > > easy
> > > > >> to
> > > > >> > > become inundated by a large, random PR with little or no
> context
> > > > >> > (sometimes
> > > > >> > > clearly AI-driven). Establishing some guidelines specifically
> > > around
> > > > >> AI
> > > > >> > > usage — both for providing development context and for helping
> > > with
> > > > >> the
> > > > >> > > review/audit process — would be fantastic, even if we start
> > small
> > > > and
> > > > >> > > gradually evolve things over time.
> > > > >> > >
> > > > >> > > Thanks for kicking this off. Looking forward to hearing what
> > > others
> > > > >> > think.
> > > > >> > >
> > > > >> > > Cheers,
> > > > >> > >
> > > > >> > > Rion
> > > > >> > >
> > > > >> > >
> > > > >> > > > On Mar 12, 2026, at 8:50 PM, Leonard Xu <[email protected]>
> > > > wrote:
> > > > >> > > >
> > > > >> > > > Hi Martijn,
> > > > >> > > >
> > > > >> > > > Thanks for kicking off this discussion. I've been thinking
> > along
> > > > >> > similar
> > > > >> > > lines recently, so you have a +1 from me on this proposal.
> > > > >> > > >
> > > > >> > > > I also have a suggestion regarding activity on the users'
> > > mailing
> > > > >> list.
> > > > >> > > Could we consider introducing an AI agent to help answer
> users'
> > > > >> > questions?
> > > > >> > > I've noticed that many inquiries on user@flink currently go
> > > > >> unanswered,
> > > > >> > > yet most of them could be effectively addressed by an agent.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Leonard
> > > > >> > > >
> > > > >> > > >> 2026 3月 13 05:03，Martijn Visser <[email protected]>
> > 写道：
> > > > >> > > >>
> > > > >> > > >> Hi all,
> > > > >> > > >>
> > > > >> > > >> I'd like to start a discussion about how the Flink
> community
> > > > should
> > > > >> > > handle
> > > > >> > > >> AI-assisted contributions and how we can make the Flink
> > > codebase
> > > > >> more
> > > > >> > > >> accessible to AI tooling.
> > > > >> > > >>
> > > > >> > > >> The ASF has published guidance on generative AI tooling
> [1],
> > > and
> > > > >> > several
> > > > >> > > >> Apache projects have already adopted project-specific
> > > guidelines
> > > > on
> > > > >> > top
> > > > >> > > of
> > > > >> > > >> that. I think Flink should too.
> > > > >> > > >>
> > > > >> > > >> The most comprehensive example I've seen is Apache Airflow.
> > > > They've
> > > > >> > > added
> > > > >> > > >> an AGENTS.md [2] with instructions for AI coding agents,
> > > > including
> > > > >> PR
> > > > >> > > >> templates with an AI disclosure checkbox, a self-review
> > > > checklist,
> > > > >> and
> > > > >> > > the
> > > > >> > > >> Generated-by: commit message token that the ASF guidance
> > > > >> recommends.
> > > > >> > > Apache
> > > > >> > > >> Iceberg recently adopted AI contribution guidelines [3]
> > focused
> > > > on
> > > > >> > > >> contributor accountability: you must be able to debug,
> > explain,
> > > > and
> > > > >> > own
> > > > >> > > the
> > > > >> > > >> changes. Other projects like Paimon [4], Mahout [5], and
> > Ozone
> > > > [6]
> > > > >> > have
> > > > >> > > >> adopted similar policies.
> > > > >> > > >>
> > > > >> > > >> I'd like to propose the following for Flink:
> > > > >> > > >>
> > > > >> > > >> 1. Adopt contribution guidelines for AI-assisted PRs.
> > > > Contributors
> > > > >> > must
> > > > >> > > >> disclose when AI tooling was used (using Generated-by:
> <Tool
> > > Name
> > > > >> and
> > > > >> > > >> Version> in the commit message), and must be able to
> explain
> > > and
> > > > >> take
> > > > >> > > >> ownership of all changes. AI-generated code is held to the
> > same
> > > > >> review
> > > > >> > > >> standards as human-written code.
> > > > >> > > >> 2. Add AGENTS.md files to the Flink repository. AGENTS.md
> [7]
> > > is
> > > > a
> > > > >> > > >> convention for giving AI coding agents project-specific
> > > context.
> > > > It
> > > > >> > can
> > > > >> > > >> contain information like build instructions, test commands,
> > > > coding
> > > > >> > > >> conventions, commit message format. I think we should add
> one
> > > at
> > > > >> the
> > > > >> > > root
> > > > >> > > >> of apache/flink.
> > > > >> > > >> 3. Add module-level context for AI tooling. This is where I
> > > think
> > > > >> we
> > > > >> > can
> > > > >> > > >> take a step forward. Each Flink module (e.g.
> > > > flink-streaming-java,
> > > > >> > > >> flink-table-planner, flink-clients) would benefit from its
> > own
> > > > >> > AGENTS.md
> > > > >> > > >> explaining the module's role, key abstractions, testing
> > > patterns,
> > > > >> and
> > > > >> > > >> common pitfalls. This also serves as architectural
> > > documentation
> > > > >> that
> > > > >> > > helps
> > > > >> > > >> human contributors.
> > > > >> > > >>
> > > > >> > > >> I'm looking forward to hearing what others think about
> this.
> > > > >> > > >>
> > > > >> > > >> Best regards,
> > > > >> > > >>
> > > > >> > > >> Martijn
> > > > >> > > >>
> > > > >> > > >> [1] https://www.apache.org/legal/generative-tooling.html
> > > > >> > > >> [2] https://github.com/apache/airflow/blob/main/AGENTS.md
> > > > >> > > >> [3]
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
> > > > >> > > >> [4]
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/paimon/blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1#L22
> > > > >> > > >> [5]
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/mahout/blob/main/docs/community/pr-policy-and-review-guidelines.md
> > > > >> > > >> [6]
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/ozone-site/blob/master/src/pages/release-notes/2.0.0.md?plain=1#L408
> > > > >> > > >> [7] https://agents.md/
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Reply via email to