Re: Code reviews in Beam

Reuven Lax Tue, 20 Feb 2018 07:35:41 -0800

Hi JB,

You're right, I was thinking more about changes to core when talking about
the technical-excellence bar.


I think there still needs to be some bar for new features and extension,
but I also think it can be much lower (as nobody is breaking anything by
merging this). An example of where we still need a bar here is tests. If a
new IO has a test that the reviewer thinks will be flaky, that flaky test
will cause problems for _every_ Beam committer, and it's fair to ask for
the test to be changed.

Given that the bar is lower for new extensions, I think we need a good way
of marking these things so that Beam users know they are not as mature as
other parts of Beam. Traditionally we've used @Experimental, but
@Experimental has been overloaded to mean other things as well. Maybe we
need to introduce a new annotation?

Reuven

On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Reuven
>
> I agree with all your points except maybe in term of bar level, especially
> on new features (like extensions or IOs). If the PRs on the core should be
> heavily reviewed, I'm more in favor of merging the PR pretty fast even if
> not perfect. It's not a technical topic, it's really a contribution and
> community topic.
>
> Thanks anyway, the dashboard is a good idea !
>
> Regards
> JB
> Le 19 févr. 2018, à 19:33, Reuven Lax <[email protected]> a écrit:
>>
>> There have been a number of threads on code reviews (most recently on a
>> "State of the project" email). These threads have died out without much
>> resolution, but I'm not sure that the concerns have gone away.
>>
>> First of all, I'm of the opinion that a code-review bar for Beam commits
>> is critical to success of the project. This is a system with many subtle
>> semantics, which might not be obvious at first glance. Beam pipelines
>> process user data, and the consequence of certain bugs might mean
>> corrupting user data and aggregations - something to avoid at all cost if
>> we want Beam to be trusted. Finally Beam pipelines often run at extremely
>> high scale; while many of our committers have a strong intuition for what
>> can go wrong when running at high scale, not everybody who wants to
>> contribute will  have this experience.
>>
>>
>> However, we also cannot afford to let our policy get in the way of
>> building a community. We *must* remain a friendly place to develop and
>> contribute.
>>
>>
>> When I look at concerns people have had on on code reviews (and I've been
>> browsing most PRs this past year), I see a few common threads:
>>
>>
>> *Review Latency*
>>
>> Latency on code reviews can be too high. At various times folks (most
>> recently, Ahmet and I) have tried to regularly look for stale PRs and ping
>> them, but latency still remains high.
>>
>>
>> *Pedantic*
>>
>> Overly-pedantic comments (change variable names, etc.) can be
>> frustrating. The PR author can feel like they are being forced to make
>> meaningless changes just so the reviewer will allow merging. Note that this
>> is sometimes in the eye of the beholder - the reviewer may not think all
>> these comments are pedantic.
>>
>>
>> *Don't Do This*
>>
>> Sometimes a reviewer rejects an entire PR, saying that this should not be
>> done. There are various reasons given: this won't scale, this will break
>> backwards compatibility, this will break a specific runner, etc. The PR
>> author may not always understand or agree with these reasons, and this can
>> leave hurt feelings.
>>
>>
>> I would like open discussion about ways of making our code-review policy
>> more welcoming. I'll seed the discussion with a few ideas:
>>
>>
>> *Code Review Dashboard and Automation*
>>
>> We should invest in adding a code-review dashboard to our site, tracking
>> stale PRs by reviewer. Quick turnaround on code reviews is essential
>> building community, so all Beam committers should consider reviewing code
>> as important as their own coding.  Spark has built a PR dashboard (
>> https://spark-prs.appspot.com/) which they’ve found better than Github’s
>> dashboard; we could easily fork this dashboard. There are also tools that
>> will automatically ping reviewers (mention-bot and hey there are two such
>> tools). We can also make sure that new PRs are auto assigned a reviewer
>> (e.g. https://github.com/imsky/pull-review)
>>
>>
>> *Code Review Response SLA*
>>
>> It would be great if we could agree on a response-time SLA for Beam code
>> reviews. The response might be “I am unable to do the review until next
>> week,” however even that is better than getting no response.
>>
>>
>> *Guideline Document*
>>
>> I think we should have a guideline document, explaining common reasons a
>> reviewer might reject an approach in a  PR. e.g. "This will cause scaling
>> problems," "This will cause problems for XXX runner," "This is backwards
>> incompatible."  Reviewers can point to this doc as part of their comments,
>> along with extra flavor. e.g. “as per the guideline doc, this will cause
>> scaling problems, and here’s why.”
>>
>>
>> *Guidelines on Comments*
>>
>> Not everyone agrees on which comments are pedantic, which makes it hard
>> to have specific guidelines here. One general guideline me might adopt: if
>> it'll take less time for the reviewer to make the changes themselves, it's
>> not an appropriate comment. The reviewer should fix those issues  a
>> follow-on PR. This adds a bit more burden on reviewers, but IMO is worth it
>> to keep Beam a friendly environment. This is especially important for
>> first-time contributors, who might otherwise lost interest. If the author
>> is a seasoned Beam contributor, we can expect more out of them.
>>
>>
>> We should also make sure that these fixups serve as educational moments
>> for the new contributor. “Thanks for the contribution! I’ll be making some
>> changes during the merge so that the code stays consistent across the
>> codebase - keep an eye on them.”
>>
>>
>> Would love to hear more thoughts.
>>
>>
>> Reuven
>>
>>

Re: Code reviews in Beam

Reply via email to