Re: Code reviews in Beam

Kenneth Knowles Tue, 20 Feb 2018 09:28:18 -0800

There was a suggestion of such a thing - a collection of less well-tested
and vetted contributions. The one that has the spirit I think of is the
Piggybank from Apache Pig. Most important to make sure all committers and
users both understand the meaning of it. The biggest problem is if users
really rely on it and then have problems. Related to that is contributors
focusing there effort here and never doing the work to reach the higher
bar. But overall I am in favor of it, too.


Kenn

On Tue, Feb 20, 2018 at 7:54 AM, Reuven Lax <re...@google.com> wrote:

> On further thought I like the separate package even more. It allows us to
> easily isolate all those tests, and not block commits or releases on them.
>
> On Tue, Feb 20, 2018 at 7:45 AM, Reuven Lax <re...@google.com> wrote:
>
>> Another idea: we could have a special package for these "unrefined"
>> contributions. Once the contribution has had time to mature some, it can be
>> moved to the regular package structure.
>>
>> On Tue, Feb 20, 2018 at 7:41 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> I would add some @ToBeRefined or so 😁
>>> Le 20 févr. 2018, à 16:35, Reuven Lax <re...@google.com> a écrit:
>>>>
>>>> Hi JB,
>>>>
>>>> You're right, I was thinking more about changes to core when talking
>>>> about the technical-excellence bar.
>>>>
>>>> I think there still needs to be some bar for new features and
>>>> extension, but I also think it can be much lower (as nobody is breaking
>>>> anything by merging this). An example of where we still need a bar here is
>>>> tests. If a new IO has a test that the reviewer thinks will be flaky, that
>>>> flaky test will cause problems for _every_ Beam committer, and it's fair to
>>>> ask for the test to be changed.
>>>>
>>>> Given that the bar is lower for new extensions, I think we need a good
>>>> way of marking these things so that Beam users know they are not as mature
>>>> as other parts of Beam. Traditionally we've used @Experimental, but
>>>> @Experimental has been overloaded to mean other things as well. Maybe we
>>>> need to introduce a new annotation?
>>>>
>>>> Reuven
>>>>
>>>> On Tue, Feb 20, 2018 at 5:48 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Reuven
>>>>>
>>>>> I agree with all your points except maybe in term of bar level,
>>>>> especially on new features (like extensions or IOs). If the PRs on the 
>>>>> core
>>>>> should be heavily reviewed, I'm more in favor of merging the PR pretty 
>>>>> fast
>>>>> even if not perfect. It's not a technical topic, it's really a 
>>>>> contribution
>>>>> and community topic.
>>>>>
>>>>> Thanks anyway, the dashboard is a good idea !
>>>>>
>>>>> Regards
>>>>> JB
>>>>> Le 19 févr. 2018, à 19:33, Reuven Lax < re...@google.com> a écrit:
>>>>>>
>>>>>> There have been a number of threads on code reviews (most recently on
>>>>>> a "State of the project" email). These threads have died out without much
>>>>>> resolution, but I'm not sure that the concerns have gone away.
>>>>>>
>>>>>> First of all, I'm of the opinion that a code-review bar for Beam
>>>>>> commits is critical to success of the project. This is a system with many
>>>>>> subtle semantics, which might not be obvious at first glance. Beam
>>>>>> pipelines process user data, and the consequence of certain bugs might 
>>>>>> mean
>>>>>> corrupting user data and aggregations - something to avoid at all cost if
>>>>>> we want Beam to be trusted. Finally Beam pipelines often run at extremely
>>>>>> high scale; while many of our committers have a strong intuition for what
>>>>>> can go wrong when running at high scale, not everybody who wants to
>>>>>> contribute will  have this experience.
>>>>>>
>>>>>>
>>>>>> However, we also cannot afford to let our policy get in the way of
>>>>>> building a community. We *must* remain a friendly place to develop
>>>>>> and contribute.
>>>>>>
>>>>>>
>>>>>> When I look at concerns people have had on on code reviews (and I've
>>>>>> been browsing most PRs this past year), I see a few common threads:
>>>>>>
>>>>>>
>>>>>> *Review Latency*
>>>>>>
>>>>>> Latency on code reviews can be too high. At various times folks (most
>>>>>> recently, Ahmet and I) have tried to regularly look for stale PRs and 
>>>>>> ping
>>>>>> them, but latency still remains high.
>>>>>>
>>>>>>
>>>>>> *Pedantic*
>>>>>>
>>>>>> Overly-pedantic comments (change variable names, etc.) can be
>>>>>> frustrating. The PR author can feel like they are being forced to make
>>>>>> meaningless changes just so the reviewer will allow merging. Note that 
>>>>>> this
>>>>>> is sometimes in the eye of the beholder - the reviewer may not think all
>>>>>> these comments are pedantic.
>>>>>>
>>>>>>
>>>>>> *Don't Do This*
>>>>>>
>>>>>> Sometimes a reviewer rejects an entire PR, saying that this should
>>>>>> not be done. There are various reasons given: this won't scale, this will
>>>>>> break backwards compatibility, this will break a specific runner, etc. 
>>>>>> The
>>>>>> PR author may not always understand or agree with these reasons, and this
>>>>>> can leave hurt feelings.
>>>>>>
>>>>>>
>>>>>> I would like open discussion about ways of making our code-review
>>>>>> policy more welcoming. I'll seed the discussion with a few ideas:
>>>>>>
>>>>>>
>>>>>> *Code Review Dashboard and Automation*
>>>>>>
>>>>>> We should invest in adding a code-review dashboard to our site,
>>>>>> tracking stale PRs by reviewer. Quick turnaround on code reviews is
>>>>>> essential building community, so all Beam committers should consider
>>>>>> reviewing code as important as their own coding.  Spark has built a
>>>>>> PR dashboard (https://spark-prs.appspot.com/) which they’ve found
>>>>>> better than Github’s dashboard; we could easily fork this dashboard. 
>>>>>> There
>>>>>> are also tools that will automatically ping reviewers (mention-bot and 
>>>>>> hey
>>>>>> there are two such tools). We can also make sure that new PRs are auto
>>>>>> assigned a reviewer (e.g. https://github.com/imsky/pull-review)
>>>>>>
>>>>>>
>>>>>> *Code Review Response SLA*
>>>>>>
>>>>>> It would be great if we could agree on a response-time SLA for Beam
>>>>>> code reviews. The response might be “I am unable to do the review until
>>>>>> next week,” however even that is better than getting no response.
>>>>>>
>>>>>>
>>>>>> *Guideline Document*
>>>>>>
>>>>>> I think we should have a guideline document, explaining common
>>>>>> reasons a reviewer might reject an approach in a  PR. e.g. "This will 
>>>>>> cause
>>>>>> scaling problems," "This will cause problems for XXX runner," "This is
>>>>>> backwards incompatible."  Reviewers can point to this doc as part of 
>>>>>> their
>>>>>> comments, along with extra flavor. e.g. “as per the guideline doc, this
>>>>>> will cause scaling problems, and here’s why.”
>>>>>>
>>>>>>
>>>>>> *Guidelines on Comments*
>>>>>>
>>>>>> Not everyone agrees on which comments are pedantic, which makes it
>>>>>> hard to have specific guidelines here. One general guideline me might
>>>>>> adopt: if it'll take less time for the reviewer to make the changes
>>>>>> themselves, it's not an appropriate comment. The reviewer should fix 
>>>>>> those
>>>>>> issues  a follow-on PR. This adds a bit more burden on reviewers,
>>>>>> but IMO is worth it to keep Beam a friendly environment. This is 
>>>>>> especially
>>>>>> important for first-time contributors, who might otherwise lost interest.
>>>>>> If the author is a seasoned Beam contributor, we can expect more out of
>>>>>> them.
>>>>>>
>>>>>>
>>>>>> We should also make sure that these fixups serve as educational
>>>>>> moments for the new contributor. “Thanks for the contribution! I’ll be
>>>>>> making some changes during the merge so that the code stays consistent
>>>>>> across the codebase - keep an eye on them.”
>>>>>>
>>>>>>
>>>>>> Would love to hear more thoughts.
>>>>>>
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>>
>>>>
>>
>

Re: Code reviews in Beam

Reply via email to