Re: Code review

Gian Merlino Tue, 23 Apr 2019 15:11:18 -0700

Hey Roman,

Thank you too for participating.


> Apart from code review friction, I think there is another important
> (ultimately, more important) problem/goal for the long term project
success
> which cannot be discussed separately: keeping the codebase highly
> maintainable. I'm a Druid developer for about 2.5 years, and I definitely
> feel that making changes is more difficult than it was 2.5 years ago. It
> means that the codebase is aging not very well. The project is on the
blue,
> not orange trajectory here:
> https://martinfowler.com/bliki/DesignStaminaHypothesis.html.
>
> So, regarding the part of Gian's proposal that is essentially saying
"let's
> lower review standards": I think the current quality of contributions is
> not high enough. If we lower the review standards, as it currently goes,
> the contribution quality will drop, too. The project may end up with a
> great community but not being able to match the innovation pace happening
> in similar other projects.

To the first point, I think that long-term maintainability is critically
important, but also that very few review comments end up making meaningful
differences here. Method naming and minor-scope factorings are small
potatoes next to what really matters: how the system is separated into
components, and what the interfaces between components look like. If we get
this right then the system will be pleasant to work on far into the future.
If we don't, then it doesn't matter how nicely methods are named or how
consistently the code is formatted. Fwiw, I definitely think Druid's design
in this regard is inconsistent - some components are well designed and some
are not. I have a personal hit-list of components and interfaces that I
think could use a rethink in order to improve maintainability, and it
sounds like you do too. It would be good to have that discussion somehow. I
think it's a separate discussion from this one, but, it'd still be good.

To the second point, I am emphatically not saying "let's lower review
standards". I am saying let's try to have a review process that doesn't
make authors feel like they're being made to jump through hoops. As
reviewers, I think the most meaningful changes we can make towards this
culturally are striving to minimize the amount of back-and-forth, and
striving to write more substantive review comments (like suggesting
specific new comments rather than 'add a comment here'). The latter is
doubly good: it makes review feel less arbitrary, and it also helps
propagate culture by showing the 'Druid way' of doing things. I think there
is also value in minimizing changes to a patch after review has begun, and
to that end, I'd encourage avoiding asking for too many 'nice to have'
refactors. Which, as mentioned above, I don't think would have a meaningful
effect on long-term maintainability anyway.

> One concrete example: code style comments. Personally, I leave a lot of
> comments about violations of the Druid's code style. However, I very
rarely
> leave such comments on Gian's PRs because he is one of the people who
> created the style and follows it.

Well, I think this is a good reason to have a robots-only policy for code
style checking. The fact that I am one of the few people that can adhere to
the unwritten style rules, because I've been working on Druid for over 7
years, means that the system today doesn't work.

> 2. Communicate in PR reviews *proactively*:

I think everything you mentioned here is common sense and good advice. I
would add one thing: for all the "authors should proactively…" stuff,
reviewers should mention that when they leave comments. For example,
instead of "why did you do this?" a better review comment is "I can't
understand why you did this. Could you please add a code comment explaining
it?" Even better, if you suspect the code is wrong: "I can't understand why
you did this. It looks like it might be incorrect for X reason. Could you
please look into that, and if it seems like the code is correct after all,
add a comment explaining how it works?"

It's a bit more work for reviewers but I think it makes a real difference
in the experience that authors have. It minimizes back-and-forth, which
minimizes frustration. And it helps propagate culture, since authors don't
just see what is being asked for, but also why, and they get a sense of
what the project's shared values are.

> 3. Regularly review other people's PRs, don't just publish your code. Gian
> have mentioned this. Addition from me: review PRs by people from not your
> org regularly.

+1000

> 4. Regularly dedicate time to do a PR that increases automation of the
> project, for example:
>     - Automate something about the CI or testing
>     - Add a static analysis (IntelliJ) check
>     - Add a Checkstyle check

Anything we can move from manual review to robotic review is good. I would
caution against getting overly aggressive with static analysis &
checkstyle, but I don't think we're there yet. The checks we have don't
seem to be excessive.

> 5. Regularly dedicate time to do a PR that is devoted specifically to
> repaying tech dept (yak shaving, refactoring)

These are tough ones. Because all refactorings have risk, and because they
are not expected to yield bug fixes or functional improvements, they tend
to languish in the review queue. So extra work is needed by authors to
motivate review of them. I don't do too many of these sorts of PRs myself,
but when I do, I try to attach motivation. One example is
https://github.com/apache/incubator-druid/pull/4889, which contained a
refactoring of the SQL operator conversion subsystem. I thought it was
worth it because the old design had the SqlAggregator operator interface be
responsible for adding filters, which was a bad design since it
complexified all implementations and was a mixing of concerns. It also
eliminated the difference between builtin operators and operators that use
the SqlAggregator interface, by moving all the builtins to the interface.
That also improved things by merging two different code paths into one,
simplifying the logic. I could have motivated it better by going into a bit
more detail, but, I guess I tried.

>     - Refactor one old ugly class
>     - Refactor one old ugly method

I don't think there is necessarily much value in this sort of refactoring,
if it's not making functional improvements or improving component design in
a way that makes long-term maintainability better. There may even be
negative value, since refactors take time to review and incur risk of
introducing new bugs.

>    - Read some part of the code, absorb information about it, and add
> comments to that code which explain how does that code work. Note: in my
> opinion, chronic lack of comments, Javadocs to classes, inter-links
between
> classes/methods, and comments about small statement-level, yet non-obvious
> things is one of the major reasons why the codebase ages poorly. I feel
> that a significant part of that decrease of programming productivity that
I
> feel for myself while developing Druid comes from the need to make
research
> for myself, encountering puzzling code, and poor navigability of the code.

This one is different. There is definitely value here. I also think that
under-documentation (of intended usage, known gotchas, etc) serves the
codebase poorly and adding more is good.

> Now, if we (regular contributors) do the above, the reviewing standards
for
> PRs from non-regular contributors (called "external" above) can be
somewhat
> lowered, because that is only a small portion of the contributions and we
> can tolerate their quality being lower than the quality of "internal"
> contributions. We cannot demand non-regular contributors to do what is
> described as "the right thing" above.

Repeating myself from earlier in this mail, I don't think we need to lower
standards. I think in all cases (regular and non-regular contributors) we
should apply the same review principles to individual patches, and those
review principles should be geared towards minimizing back-and-forth,
encouraging contribution, while also making sure that the really bad stuff
doesn't make it into the codebase (bugs, breakages of existing use cases,
poorly-thought-out new components, excessive maintenance burdens).

On Tue, Apr 23, 2019 at 10:59 AM Roman Leventov <leventov...@gmail.com>
wrote:

> Apart from code review friction, I think there is another important
> (ultimately, more important) problem/goal for the long term project success
> which cannot be discussed separately: keeping the codebase highly
> maintainable. I'm a Druid developer for about 2.5 years, and I definitely
> feel that making changes is more difficult than it was 2.5 years ago. It
> means that the codebase is aging not very well. The project is on the blue,
> not orange trajectory here:
> https://martinfowler.com/bliki/DesignStaminaHypothesis.html.
>
> So, regarding the part of Gian's proposal that is essentially saying "let's
> lower review standards": I think the current quality of contributions is
> not high enough. If we lower the review standards, as it currently goes,
> the contribution quality will drop, too. The project may end up with a
> great community but not being able to match the innovation pace happening
> in similar other projects.
>
> To discuss how we can make the quality of contributions higher, let's first
> note that there are two types of contributions:
>  - "Internal": contributions by regular contributors. Developing Druid is a
> large part of their day job at their companies. I think currently it's
> about 80% of all contributions, the majority of that is from employees of
> Imply.
>  - "External": by people who are not regular contributors. Developing Druid
> is usually not part of their job.
>
> The good news is that since the majority of contributions are coming from
> the employees of companies which have a stake in the Druid's success, there
> is a way to increase the quality of contributions _and_ reduce the code
> review friction: ask the regular contributors to "do the right thing".
>
> One concrete example: code style comments. Personally, I leave a lot of
> comments about violations of the Druid's code style. However, I very rarely
> leave such comments on Gian's PRs because he is one of the people who
> created the style and follows it. So the way to reduce the friction from
> such comments is to ask regular contributors to learn and follow that style
> themselves. This way the number of such comments will reduce drastically,
> simply because there would be nothing to comment about.
>
> I think "the right thing" for regular contributors can be boiled down to
> the following:
>
> 1. Self-review the PR before publishing, according to a set of standards
> and carefully chosen self-ask questions (checklist items).
>
> 2. Communicate in PR reviews *proactively*:
>    - Both the author and reviewers proactively make the life of other
> parties easier, regardless of whether the other party did make their life
> easy in the PR or not:
>        - The author makes the description of the PR elaborate, yet
> highlights changed/added APIs, important changes, design considerations,
> etc. (Both Gian and Slim have mentioned that.)
>        - A reviewer provides the author with necessary information about
> their comments: links, etc. A reviewer doesn't ask rhetoric questions (but:
> asks real questions, if they have some).
>    - If a reviewer asks questions about the change because they don't
> understand something about the code, the author proactively adds comments
> to the code or restructures the code so that that the reviewer wouldn't
> have that question in the first place.
>    - If a reviewer points to some problem, the author proactively searches
> for all similar problems in your PR and fix them, without waiting for a
> reviewer to find them for you.
>    - If either the author or a reviewer notices some side problem during
> the discussion, they proactively create an issue about that problem
> themselves, not waiting for the other party to do that.
>    - If either the author or a reviewer discovers something new for
> themselves during the discussion, they proactively internalize that new
> knowledge, so that it becomes the part of their knowledge, and they will
> share that knowledge with somebody else during a different PR review
> discussion if needed. This knowledge may be about the code style, the
> programming practice, some specific knowledge about the project. Note1: the
> concrete example of code style comments from above falls down here. Note2:
> if the discovered knowledge about the code style or the programming
> practice is not part of the standards, it should be added there. Note3: the
> specific knowledge about the project should be put somewhere in the
> comments in the code itself and linked from all the places where it is also
> relevant.
>
> 3. Regularly review other people's PRs, don't just publish your code. Gian
> have mentioned this. Addition from me: review PRs by people from not your
> org regularly.
>
> 4. Regularly dedicate time to do a PR that increases automation of the
> project, for example:
>     - Automate something about the CI or testing
>     - Add a static analysis (IntelliJ) check
>     - Add a Checkstyle check
>
> Gian mentioned the need to increase automation, but without committing to
> regular practice by all regular contributors, I'm afraid it will remain
> words.
>
> 5. Regularly dedicate time to do a PR that is devoted specifically to
> repaying tech dept (yak shaving, refactoring), for example:
>     - Refactor one old ugly class
>     - Refactor one old ugly method
>     - Read some part of the code, absorb information about it, and add
> comments to that code which explain how does that code work. Note: in my
> opinion, chronic lack of comments, Javadocs to classes, inter-links between
> classes/methods, and comments about small statement-level, yet non-obvious
> things is one of the major reasons why the codebase ages poorly. I feel
> that a significant part of that decrease of programming productivity that I
> feel for myself while developing Druid comes from the need to make research
> for myself, encountering puzzling code, and poor navigability of the code.
>      - Improve documentation of some configuration parameter
>
> Notes about the last three points ("Regularly do X"):
>  - A rate of regularity may vary significantly, but there should be some
> slice of the time that *every* regular contributor devotes to *every*
> mentioned activity, even as little as 1 hour in 2 weeks (about 1% of their
> time). Just not zero.
>  - At least for myself, I find that the only way to start doing something
> regularly is to put it in my calendar, not just promise to do that for
> myself.
>
> Now, if we (regular contributors) do the above, the reviewing standards for
> PRs from non-regular contributors (called "external" above) can be somewhat
> lowered, because that is only a small portion of the contributions and we
> can tolerate their quality being lower than the quality of "internal"
> contributions. We cannot demand non-regular contributors to do what is
> described as "the right thing" above.
>
> On Tue, 23 Apr 2019 at 04:43, Gian Merlino <g...@apache.org> wrote:
>
> > Hey Druids,
> >
> > Sometimes I feel like this too:
> > https://twitter.com/julianhyde/status/1108502471830204416.
> >
> > I believe our code review process today has too much friction in it, and
> > that we can work to reduce it. The two main frictions I see are code
> > reviews not happening in a timely manner, and code reviews sometimes
> > descending into a swamp of nit-picks. The good news, at least, is that we
> > are not the first development team to have these problems, and that they
> > can be solved. I've written some thoughts below about principles that I
> > think might help. I am not necessarily proposing making these the law of
> > the land, but am hoping to start a discussion about how we can generally
> > improve things.
> >
> > 1) "Let robots handle style checks." Encode Druid's code style in
> > checkstyle or other tools, and avoid making subjective style comments
> > directly as humans. If we can't figure out how to set up automated
> > verification for a particular style point, then give it up. Rationale:
> > authors can self-check style when checking is automated. Also, it's
> better
> > for robots to enforce style, because software development is a social
> > endeavor, and people don't mind style nit-picking from robots as much as
> > they do from humans.
> >
> > 2) "Write down what really matters." I suggest we put together a short,
> > highly visible list of things that should be considered commit-blockers
> for
> > patches. Not a list of best practices, but a list of true blockers.
> "Short"
> > is in service of point (3), below. "Highly visible" is so it can act as a
> > shared set of values. My suggested list would be correctness of
> > implementation, documentation of new or changed functionality, tests for
> > reasonably testable functionality, avoidance of excessive additional
> > maintenance burden, and avoidance of breaking existing use cases
> (including
> > things that would break clusters that run at large scale, or that would
> > break rolling updates). Some of these points are subjective but I think
> > it's still a good start. Rationale: authors will know what is expected of
> > them, which should generally improve PR quality, and speed up review.
> Also,
> > similar to the previous point: people are generally happy to follow what
> > they perceive as community-maintained standards, but not as happy to
> follow
> > what they perceive as the opinion of a single reviewer.
> >
> > 3) "Minimize hoop-jumping." We should make an effort to avoid the '74
> > one-line emails' problem. Endeavor to make fewer and fewer comments as
> the
> > number of rounds of review of a PR gets higher. Endeavor to keep the
> number
> > of rounds of review from getting too high. Authors can help here by
> > explaining their decisions clearly, and by avoiding making large changes
> in
> > their patches after review has started. Reviewers can help by making
> their
> > first review as comprehensive as possible, to avoid the need to bring up
> > brand-new points on later rounds of review. Reviewers can also help by
> > writing out changes they're asking for explicitly (suggest a new comment,
> > doc wording, method name, etc). Reviewers can also help by noting in a
> > review comment when a suggestion is just a suggestion -- useful because
> > many authors are likely to interpret an unqualified comment as a commit
> > blocker. Rationale: the more rounds of review a PR goes through, and the
> > more a review feels like a collection of disconnected 1-liner comments,
> the
> > more it makes the original author feel as if he or she is being made to
> > jump through hoops. This makes people less likely to contribute in the
> > future, and damages efforts to build community.
> >
> > 4) "Pitch in on reviews." A relatively small number of committers are
> doing
> > most of the code reviews. These people have limited time, and it means
> that
> > PRs often stay in the review queue for a while without anyone looking at
> > them. Any committer can pitch in, and in fact, even non-committers can
> > (although their votes are nonbinding, their reviews are still helpful for
> > social reasons). So anyone interested in reviewing is encouraged to do
> so.
> > Rationale: improves bandwidth, prevents burning out the small number of
> > volunteers that participate in reviews.
> >
> > Looking forward to seeing what people in the community think.
> >
> > Gian
> >
>

Re: Code review

Reply via email to