Reuven, thank you for bringing this topic.

As a new contributor to Beam codebase I raise two my hands for such guideline 
document and I'd propose to add it as a new guide into section “Other Guides” 
on web site documentation. 

For sure, there are already several very helpful and detailed guides, like 
“PTransform style guide” and “Runner authoring guide” that help a lot. However, 
IMO, it would make sense, perhaps, to have a new guide which is dedicated only 
to Code Review process and will be helpful as for new contributors so for 
reviewers too. Probably, it might look like a top list of common mistakes 
because of them some PRs were rejected and places where it is required to pay 
attention but, of course, format is open and need to be discussed.

I believe that it should reduce the number of common mistakes for newcomers 
like me and keep common the guide lines for all participants of review process.

WBR,
Alexey

> On 20 Feb 2018, at 14:01, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> This is excellent!
> 
> I can't really add anything right now but I think having a PR dashboard is 
> one of the most important points because it also indirectly solves "Review 
> Latency" and "Code Review Response SLA" by making things more visible.
> 
> --
> Aljoscha
> 
>> On 19. Feb 2018, at 19:32, Reuven Lax <re...@google.com 
>> <mailto:re...@google.com>> wrote:
>> 
>> There have been a number of threads on code reviews (most recently on a 
>> "State of the project" email). These threads have died out without much 
>> resolution, but I'm not sure that the concerns have gone away. 
>> 
>> First of all, I'm of the opinion that a code-review bar for Beam commits is 
>> critical to success of the project. This is a system with many subtle 
>> semantics, which might not be obvious at first glance. Beam pipelines 
>> process user data, and the consequence of certain bugs might mean corrupting 
>> user data and aggregations - something to avoid at all cost if we want Beam 
>> to be trusted. Finally Beam pipelines often run at extremely high scale; 
>> while many of our committers have a strong intuition for what can go wrong 
>> when running at high scale, not everybody who wants to contribute will  have 
>> this experience.
>> 
>> However, we also cannot afford to let our policy get in the way of building 
>> a community. We must remain a friendly place to develop and contribute.
>> 
>> When I look at concerns people have had on on code reviews (and I've been 
>> browsing most PRs this past year), I see a few common threads:
>> 
>> Review Latency
>> Latency on code reviews can be too high. At various times folks (most 
>> recently, Ahmet and I) have tried to regularly look for stale PRs and ping 
>> them, but latency still remains high. 
>> 
>> Pedantic
>> Overly-pedantic comments (change variable names, etc.) can be frustrating. 
>> The PR author can feel like they are being forced to make meaningless 
>> changes just so the reviewer will allow merging. Note that this is sometimes 
>> in the eye of the beholder - the reviewer may not think all these comments 
>> are pedantic.
>> 
>> Don't Do This
>> Sometimes a reviewer rejects an entire PR, saying that this should not be 
>> done. There are various reasons given: this won't scale, this will break 
>> backwards compatibility, this will break a specific runner, etc. The PR 
>> author may not always understand or agree with these reasons, and this can 
>> leave hurt feelings.
>> 
>> I would like open discussion about ways of making our code-review policy 
>> more welcoming. I'll seed the discussion with a few ideas:
>> 
>> Code Review Dashboard and Automation
>> We should invest in adding a code-review dashboard to our site, tracking 
>> stale PRs by reviewer. Quick turnaround on code reviews is essential 
>> building community, so all Beam committers should consider reviewing code as 
>> important as their own coding.  Spark has built a PR dashboard 
>> (https://spark-prs.appspot.com/ <https://spark-prs.appspot.com/>) which 
>> they’ve found better than Github’s dashboard; we could easily fork this 
>> dashboard. There are also tools that will automatically ping reviewers 
>> (mention-bot and hey there are two such tools). We can also make sure that 
>> new PRs are auto assigned a reviewer (e.g. 
>> https://github.com/imsky/pull-review <https://github.com/imsky/pull-review>)
>> 
>> Code Review Response SLA
>> It would be great if we could agree on a response-time SLA for Beam code 
>> reviews. The response might be “I am unable to do the review until next 
>> week,” however even that is better than getting no response.
>> 
>> Guideline Document
>> I think we should have a guideline document, explaining common reasons a 
>> reviewer might reject an approach in a  PR. e.g. "This will cause scaling 
>> problems," "This will cause problems for XXX runner," "This is backwards 
>> incompatible."  Reviewers can point to this doc as part of their comments, 
>> along with extra flavor. e.g. “as per the guideline doc, this will cause 
>> scaling problems, and here’s why.”
>> 
>> Guidelines on Comments
>> Not everyone agrees on which comments are pedantic, which makes it hard to 
>> have specific guidelines here. One general guideline me might adopt: if 
>> it'll take less time for the reviewer to make the changes themselves, it's 
>> not an appropriate comment. The reviewer should fix those issues  a 
>> follow-on PR. This adds a bit more burden on reviewers, but IMO is worth it 
>> to keep Beam a friendly environment. This is especially important for 
>> first-time contributors, who might otherwise lost interest. If the author is 
>> a seasoned Beam contributor, we can expect more out of them.  
>> 
>> We should also make sure that these fixups serve as educational moments for 
>> the new contributor. “Thanks for the contribution! I’ll be making some 
>> changes during the merge so that the code stays consistent across the 
>> codebase - keep an eye on them.” 
>> 
>> Would love to hear more thoughts.
>> 
>> Reuven
>> 
> 

Reply via email to