Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Martijn Visser Mon, 16 Mar 2026 05:23:01 -0700

 Hi all,

Thanks for all the feedback and support. I've opened a draft PR [1] that
covers points 1 and 2 from the original proposal.


What's in the PR:

1. The PR includes an AGENTS.md at the repository root with prerequisites,
build/test commands, repository structure, architecture boundaries, common
change patterns, coding standards, testing standards, commit conventions,
and boundaries. It also updates the PR template with a dedicated AI
disclosure section (checkbox + Generated-by tag).
2. Module-level AGENTS.md files (point 3) are not (yet) included and can be
added incrementally by module maintainers.

I've used Claude to generate this PR, to show how these tools can also help
us with these things.

Let me also respond to the individual points raised.

@Leonard: Interesting idea about an AI agent for the users' mailing list,
but I'd think it would also be great if we could integrate it in the Slack
workspace itself for those that are more active there. I think that's a
separate discussion worth having, but out of scope for this proposal. Would
you like to start a dedicated thread for that?

@Zakelly: Good point about architecture, performance, and code reusability.
The AGENTS.md includes an "Architecture Boundaries" section and a "Common
Change Patterns" section that maps change types to the modules they affect,
which should help steer AI agents in the right direction. Regarding GitHub
labels and bot reminders for AI-generated PRs: I think that's a good idea
but would be a separate follow-up. I think we should get the baseline
guidelines in place first.

@Vaquar: Thanks for sharing. I think AGENTS.md and the PR template
disclosure are the right starting point for Flink. Deterministic
build-system gates are an interesting idea, but I'd want to see how the
community's experience with AI contributions evolves before adding that
level of enforcement. If you'd like to propose something concrete for
Flink, a FLIP would be the right vehicle for that.

Process question:

Since these are contribution guidelines rather than API or architecture
changes, I think a vote on this thread would be sufficient. But if the
community feels this warrants a formal FLIP, I'm happy to go that route.
What do others think?

Feedback on the PR is welcome.

Thanks, Martijn

[1] https://github.com/apache/flink/pull/27776

On Sat, Mar 14, 2026 at 6:13 AM vaquar khan <[email protected]> wrote:

> Hi Martijn, Zakelly, and everyone,
>
> +1 to adding AGENTS.md. It's a great first step
> as all other Apache projects follow the same approach.
>
> I saw this thread and thought I'd chime in because I'm actually working on
> a draft KIP proposal  on this exact topic right now.
>
> To Zakelly's point about AI falling short on architecture: AGENTS.md is a
> great guide, but it’s ultimately a "soft control." In my experience, LLMs
> probabilistically ignore markdown instructions when their context windows
> fill up or prompts drift.
>
> To really stop the review fatigue, my KIP draft proposes adding a
> deterministic "hard control" hooked directly into the build system. It uses
> local AST parsing to automatically block PRs that are mostly empty
> scaffolding/docstrings (low logic density) or violate core architectural
> patterns. It catches the "AI slop" before a human ever has to look at it.
>
> If the community is interested, I’d be happy to share my draft KIP. It
> might be a helpful reference if we want to explore a similar Maven-based
> gate for Flink.
>
> Regards,
>
> Vaquar Khan
>
> On Thu, Mar 12, 2026 at 9:57 PM Zakelly Lan <[email protected]> wrote:
>
> > Hi, Martjin,
> >
> > Thanks for bringing this up. I'd +1 on this proposal.
> >
> > In the guidelines, I'd like to emphasize that contributors and reviewers
> > should pay particular attention to architecture, performance, and code
> > reusability. Based on my experience working with AI, code agents often
> fall
> > short in these.
> >
> > And furthermore, I suggest we introduce mechanisms to ensure a smooth
> > review process for AI-generated code, such as adding github labels and a
> > special reminder for reviewers from the flink's github bot.
> >
> >
> > Best,
> > Zakelly
> >
> >
> > On Fri, Mar 13, 2026 at 10:09 AM Rion Williams <[email protected]>
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > I think this is a great idea and definitely an effort worth pursuing —
> > > it’s actually something I’ve been considering experimenting with
> myself.
> > A
> > > clear +1 from me, and I’d be happy to help as the effort develops.
> > >
> > > On the reviewer side, we already have a pretty solid set of guardrails
> > and
> > > review processes in place, which is great. That said, it’s still easy
> to
> > > become inundated by a large, random PR with little or no context
> > (sometimes
> > > clearly AI-driven). Establishing some guidelines specifically around AI
> > > usage — both for providing development context and for helping with the
> > > review/audit process — would be fantastic, even if we start small and
> > > gradually evolve things over time.
> > >
> > > Thanks for kicking this off. Looking forward to hearing what others
> > think.
> > >
> > > Cheers,
> > >
> > > Rion
> > >
> > >
> > > > On Mar 12, 2026, at 8:50 PM, Leonard Xu <[email protected]> wrote:
> > > >
> > > > Hi Martijn,
> > > >
> > > > Thanks for kicking off this discussion. I've been thinking along
> > similar
> > > lines recently, so you have a +1 from me on this proposal.
> > > >
> > > > I also have a suggestion regarding activity on the users' mailing
> list.
> > > Could we consider introducing an AI agent to help answer users'
> > questions?
> > > I've noticed that many inquiries on user@flink currently go
> unanswered,
> > > yet most of them could be effectively addressed by an agent.
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >> 2026 3月 13 05:03，Martijn Visser <[email protected]> 写道：
> > > >>
> > > >> Hi all,
> > > >>
> > > >> I'd like to start a discussion about how the Flink community should
> > > handle
> > > >> AI-assisted contributions and how we can make the Flink codebase
> more
> > > >> accessible to AI tooling.
> > > >>
> > > >> The ASF has published guidance on generative AI tooling [1], and
> > several
> > > >> Apache projects have already adopted project-specific guidelines on
> > top
> > > of
> > > >> that. I think Flink should too.
> > > >>
> > > >> The most comprehensive example I've seen is Apache Airflow. They've
> > > added
> > > >> an AGENTS.md [2] with instructions for AI coding agents, including
> PR
> > > >> templates with an AI disclosure checkbox, a self-review checklist,
> and
> > > the
> > > >> Generated-by: commit message token that the ASF guidance recommends.
> > > Apache
> > > >> Iceberg recently adopted AI contribution guidelines [3] focused on
> > > >> contributor accountability: you must be able to debug, explain, and
> > own
> > > the
> > > >> changes. Other projects like Paimon [4], Mahout [5], and Ozone [6]
> > have
> > > >> adopted similar policies.
> > > >>
> > > >> I'd like to propose the following for Flink:
> > > >>
> > > >> 1. Adopt contribution guidelines for AI-assisted PRs. Contributors
> > must
> > > >> disclose when AI tooling was used (using Generated-by: <Tool Name
> and
> > > >> Version> in the commit message), and must be able to explain and
> take
> > > >> ownership of all changes. AI-generated code is held to the same
> review
> > > >> standards as human-written code.
> > > >> 2. Add AGENTS.md files to the Flink repository. AGENTS.md [7] is a
> > > >> convention for giving AI coding agents project-specific context. It
> > can
> > > >> contain information like build instructions, test commands, coding
> > > >> conventions, commit message format. I think we should add one at the
> > > root
> > > >> of apache/flink.
> > > >> 3. Add module-level context for AI tooling. This is where I think we
> > can
> > > >> take a step forward. Each Flink module (e.g. flink-streaming-java,
> > > >> flink-table-planner, flink-clients) would benefit from its own
> > AGENTS.md
> > > >> explaining the module's role, key abstractions, testing patterns,
> and
> > > >> common pitfalls. This also serves as architectural documentation
> that
> > > helps
> > > >> human contributors.
> > > >>
> > > >> I'm looking forward to hearing what others think about this.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Martijn
> > > >>
> > > >> [1] https://www.apache.org/legal/generative-tooling.html
> > > >> [2] https://github.com/apache/airflow/blob/main/AGENTS.md
> > > >> [3]
> > > >>
> > >
> >
> https://iceberg.apache.org/contribute/#guidelines-for-ai-assisted-contributions
> > > >> [4]
> > > >>
> > >
> >
> https://github.com/apache/paimon/blob/master/.github/PULL_REQUEST_TEMPLATE.md?plain=1#L22
> > > >> [5]
> > > >>
> > >
> >
> https://github.com/apache/mahout/blob/main/docs/community/pr-policy-and-review-guidelines.md
> > > >> [6]
> > > >>
> > >
> >
> https://github.com/apache/ozone-site/blob/master/src/pages/release-notes/2.0.0.md?plain=1#L408
> > > >> [7] https://agents.md/
> > > >
> > >
> >
>

Re: [DISCUSS] AI-Assisted Contributions and AI Tooling Support in Apache Flink

Reply via email to