[DISCUSS] Git commits and metadata

David Arthur Tue, 25 Mar 2025 17:07:50 -0700

Hello Kafka community!

I wanted to start a discussion around our Git commits and the metadata we
keep in them.


We have historically used the "Reviewers" Git trailer [1] to indicate who
had reviewed a commit. Originally, it seems we used this field to indicate
the committer who approved the change (per our By-Laws). But over time, its
usage has expanded to include anyone (committer or not) who left a comment
on the PR.

I think acknowledging reviews is very important for our community, and I
want to continue doing this.

I also think it is important to record the committer who approved a given
PR.

Besides improving the quality of our Git log, I'm raising this issue
because of a limitation/quirk I've discovered in GitHub. While researching
https://github.com/apache/kafka/pull/19242 I found that GitHub will
automatically re-wrap the text of the PR body to fit 72 characters. This
will blindly break long "Reviewers" lines (which we regularly exceed). This
will make it difficult to easily find reviewers of PRs using the Git log.

I would like to propose that we start using the following Git trailers in
our PRs:

* Reviewed-by: anyone who left feedback on the PR
- * Approved-by: committers who approved the PR
- * Helped-by: shout-outs for inspiration or significant help
- * Signed-off-by: commit signatory
- * Fixes: KAFKA-12345 <https://issues.apache.org/jira/browse/KAFKA-12345>
- * References: KIP-123, KAFKA-23456, apache/flink#1234, etc

Each "-by" trailer should include only a single individual using the
standard "First Last <email>" format. Multiple instances of the same
trailer are allowed. Before merging, we would use a GitHub Action to verify
that at least one "Approved-by" is present.

This accomplishes a few things:

* Works around the GitHub 72 character limit
* Keeps a cleaner record of reviewers vs approvers
* Allows us to move metadata out of the commit subject, if desired (e.g.,
KAFKA-12345)

Structuring the metadata in this way allows us to use standard git commands
to parse the commits. This helps us move towards a fully automated process
where we populate these fields from the GitHub Pull Request data.

Let me know what you think!

David Arthur

[1] https://git-scm.com/docs/git-interpret-trailers

[DISCUSS] Git commits and metadata

Reply via email to