There have been a number of threads on code reviews (most recently on a
"State of the project" email). These threads have died out without much
resolution, but I'm not sure that the concerns have gone away.

First of all, I'm of the opinion that a code-review bar for Beam commits is
critical to success of the project. This is a system with many subtle
semantics, which might not be obvious at first glance. Beam pipelines
process user data, and the consequence of certain bugs might mean
corrupting user data and aggregations - something to avoid at all cost if
we want Beam to be trusted. Finally Beam pipelines often run at extremely
high scale; while many of our committers have a strong intuition for what
can go wrong when running at high scale, not everybody who wants to
contribute will  have this experience.

However, we also cannot afford to let our policy get in the way of building
a community. We *must* remain a friendly place to develop and contribute.

When I look at concerns people have had on on code reviews (and I've been
browsing most PRs this past year), I see a few common threads:

*Review Latency*

Latency on code reviews can be too high. At various times folks (most
recently, Ahmet and I) have tried to regularly look for stale PRs and ping
them, but latency still remains high.


Overly-pedantic comments (change variable names, etc.) can be frustrating.
The PR author can feel like they are being forced to make meaningless
changes just so the reviewer will allow merging. Note that this is
sometimes in the eye of the beholder - the reviewer may not think all these
comments are pedantic.

*Don't Do This*

Sometimes a reviewer rejects an entire PR, saying that this should not be
done. There are various reasons given: this won't scale, this will break
backwards compatibility, this will break a specific runner, etc. The PR
author may not always understand or agree with these reasons, and this can
leave hurt feelings.

I would like open discussion about ways of making our code-review policy
more welcoming. I'll seed the discussion with a few ideas:

*Code Review Dashboard and Automation*

We should invest in adding a code-review dashboard to our site, tracking
stale PRs by reviewer. Quick turnaround on code reviews is essential
building community, so all Beam committers should consider reviewing code
as important as their own coding.  Spark has built a PR dashboard ( which they’ve found better than Github’s
dashboard; we could easily fork this dashboard. There are also tools that
will automatically ping reviewers (mention-bot and hey there are two such
tools). We can also make sure that new PRs are auto assigned a reviewer

*Code Review Response SLA*

It would be great if we could agree on a response-time SLA for Beam code
reviews. The response might be “I am unable to do the review until next
week,” however even that is better than getting no response.

*Guideline Document*

I think we should have a guideline document, explaining common reasons a
reviewer might reject an approach in a  PR. e.g. "This will cause scaling
problems," "This will cause problems for XXX runner," "This is backwards
incompatible."  Reviewers can point to this doc as part of their comments,
along with extra flavor. e.g. “as per the guideline doc, this will cause
scaling problems, and here’s why.”

*Guidelines on Comments*

Not everyone agrees on which comments are pedantic, which makes it hard to
have specific guidelines here. One general guideline me might adopt: if
it'll take less time for the reviewer to make the changes themselves, it's
not an appropriate comment. The reviewer should fix those issues  a
follow-on PR. This adds a bit more burden on reviewers, but IMO is worth it
to keep Beam a friendly environment. This is especially important for
first-time contributors, who might otherwise lost interest. If the author
is a seasoned Beam contributor, we can expect more out of them.

We should also make sure that these fixups serve as educational moments for
the new contributor. “Thanks for the contribution! I’ll be making some
changes during the merge so that the code stays consistent across the
codebase - keep an eye on them.”

Would love to hear more thoughts.


Reply via email to