[Bitcoin-development] Code review

2013-10-04 Thread Mike Hearn
Git makes it easy to fork peoples work off and create long series of
commits that achieve some useful goal. That's great for many things.
Unfortunately, code review is not one of those things.

I'd like to make a small request - when submitting large, complex pieces of
work for review, please either submit it as one giant squashed change, or
be an absolute fascist about keeping commits logically clean and separated.
It really sucks to review things in sequence and then discover that some
code you spent some time thinking about or puzzling out got
deleted/rewritten/changed in a later commit. It also can make it harder to
review things when later code uses new APIs or behaviour changes introduced
in earlier commits - you have to either keep it all in your head, do lots
of tab switching, or do a squash yourself (in which case every reviewer
would have to manually do that).

On a related note, github seems to have lost the plot with regards to code
review - they are spending their time adding 3D renderers to their diff
viewer but not making basic improvements other tools had for years.

So, I'd like to suggest the idea of using Review Board:

http://www.reviewboard.org/

It's an open source, dedicated code review tool used by lots of big name
companies for their internal work. It has git[hub] integration and a lot of
very neat features, like the ability to attach screenshots to reviews. Also
more basic ones, like side by side diffs. Branches can be and often are
submitted to the system as single reviews.

The company behind it (disclosure - written and run by a long time friend
of mine) offers hosting plans, but we could also host it on a Foundation
server instead.
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Andy Parkins
On Friday 04 October 2013 12:30:07 Mike Hearn wrote:
 Git makes it easy to fork peoples work off and create long series of
 commits that achieve some useful goal. That's great for many things.
 Unfortunately, code review is not one of those things.
 
 I'd like to make a small request - when submitting large, complex pieces of
 work for review, please either submit it as one giant squashed change, or

Don't do this.  It throws away all of the good stuff that git lets you record.  
There is more to a git branch than just the overall difference.  Every single 
log message and diff is individually valuable.  It's easy to make a squashed 
diff from many little commits; it's impossible to go the other way.

Command line for you so you don't have to think about it:

  git diff $(git merge-base master feature-branch) feature-branch 

git-merge-base finds the common ancestor between master and feature-branch, 
and then compares feature-branch against that.


Andy

-- 
Dr Andy Parkins
andypark...@gmail.com


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Mike Hearn
 There is more to a git branch than just the overall difference.  Every
 single
 log message and diff is individually valuable.


When the log messages don't accurately describe the contents of the diff,
it's just misinformation and noise. Everyone starts out by wanting a neat
collection of easy to understand and review commits, but in practice it's
extremely hard to always get it.

I know how to make squashed commits, thanks. I've done LOTS of code review
in my life. I'm making a point here as one of the few people who goes
through large pull requests and reviews them line by line. It's hard,
partly because github sucks, and partly because reviewing lots of small
commits sucks.

There's nothing that makes a single large commit harder to review. It's the
same amount of code or strictly less, given the tendency for later commits
to change earlier ones. You can easily search the entire change whilst
reviewing. There are lots of things that make it easier.

FWIW inside Google the code review process is one-commit-one-review.
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Peter Todd
On Fri, Oct 04, 2013 at 11:42:29AM +0100, Andy Parkins wrote:
 On Friday 04 October 2013 12:30:07 Mike Hearn wrote:
  Git makes it easy to fork peoples work off and create long series of
  commits that achieve some useful goal. That's great for many things.
  Unfortunately, code review is not one of those things.
  
  I'd like to make a small request - when submitting large, complex pieces of
  work for review, please either submit it as one giant squashed change, or
 
 Don't do this.  It throws away all of the good stuff that git lets you 
 record.  
 There is more to a git branch than just the overall difference.  Every single 
 log message and diff is individually valuable.  It's easy to make a squashed 
 diff from many little commits; it's impossible to go the other way.
 
 Command line for you so you don't have to think about it:
 
   git diff $(git merge-base master feature-branch) feature-branch 
 
 git-merge-base finds the common ancestor between master and feature-branch, 
 and then compares feature-branch against that.

Git is a revision *communication* system that happens to also make for a
good revision *control* system.

Remember that every individual commit is two things: what source code
has changed, and a message explaining why you thought that change should
be made. Commits aren't valuable in of themselves, they're valuable
because they serve to explain to the other people you are working with
why you thought a change should be made. Sometimes it makes sense to
explain your changes in 10 commits, sometimes it makes sense to squash
them all up into one commit, but there's no hard and fast rule other
than Put yourself in your fellow coders' shoes - what's the best way to
explain to them what you are trying to accomplish and why? You may have
generated a lot of little commits in the process of creating your patch
that tell a story that no-one else cares about, or equally by squashing
everything into one big commit you wind up with a tonne of changes with
little explanation as to why they were made.

Two caveats apply however: git-bisect works best if every commit in the
tree you are trying to debug works well enough that you can run tests
without errors - that is you don't break the build. Don't make commits
that don't compile at the very least, and preferably everything you do
should be refactored to the point where the commit as a whole works.

The second caveat is more specific to Bitcoin: people tend to rebase
their pull-requests over and over again until they are accepted, but
that also means that code review done earlier doesn't apply to the later
code pushed. Bitcoin is a particularly high profile, and high profit,
target for people trying to get malicious code into the codebase. It may
be the case that we would be better off asking reviewers making small
changes to their pull-requests to add additional commits for those
changes rather than rebasing, to make it clear what changes were
actually made so that code reviewers don't have to review the whole
patch from scratch. After all, the place where the most eyes will
actually look at the commits is during the pull-req process; after the
code has been pulled the audience for those commits is in most cases
almost no-one.

Having said that, there's currently a lot of other holes in the review
and source code integrity process, so fixing this problem is probably
not the low-hanging fruit right now.


FWIW personally I tend to review patches by both looking at the
individual commits to try to understand why someone wanted to make a
change, as well as all commits merged into one diff for a what actually
changed here? review.

-- 
'peter'[:-1]@petertodd.org


signature.asc
Description: Digital signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Peter Todd
On Fri, Oct 04, 2013 at 12:30:07PM +0200, Mike Hearn wrote:
 Git makes it easy to fork peoples work off and create long series of
 commits that achieve some useful goal. That's great for many things.
 Unfortunately, code review is not one of those things.
 
 I'd like to make a small request - when submitting large, complex pieces of
 work for review, please either submit it as one giant squashed change, or
 be an absolute fascist about keeping commits logically clean and separated.
 It really sucks to review things in sequence and then discover that some
 code you spent some time thinking about or puzzling out got
 deleted/rewritten/changed in a later commit. It also can make it harder to
 review things when later code uses new APIs or behaviour changes introduced
 in earlier commits - you have to either keep it all in your head, do lots
 of tab switching, or do a squash yourself (in which case every reviewer
 would have to manually do that).

When I'm reviewing multiple commit pull-requests and want to see every
change made, I always either click on the Files Changed tab on github,
which collapses every commit into a single diff, or do the equivalent
with git log.

Why doesn't that work for you?

 On a related note, github seems to have lost the plot with regards to code
 review - they are spending their time adding 3D renderers to their diff
 viewer but not making basic improvements other tools had for years.
 
 So, I'd like to suggest the idea of using Review Board:
 
 http://www.reviewboard.org/
 
 It's an open source, dedicated code review tool used by lots of big name
 companies for their internal work. It has git[hub] integration and a lot of
 very neat features, like the ability to attach screenshots to reviews. Also
 more basic ones, like side by side diffs. Branches can be and often are
 submitted to the system as single reviews.
 
 The company behind it (disclosure - written and run by a long time friend
 of mine) offers hosting plans, but we could also host it on a Foundation
 server instead.

One advantage of using github is that they're an independent third
party; we should think carefully about the risks of furthering the
impression that Bitcoin development is a closed process by moving the
code review it to a server that we control with explicit review groups.

Given that Review Board appears to remain cryptographically unverifiable
there may also be disadvantages in operating it ourselves in that if the
review server does get compromised we *don't* have a third-party to
blame. In addition GitHub is a third-party with a very valuable
reputation to uphold and full-time staff - they're doing a better job of
keeping their servers secure and running then we ever could.

-- 
'peter'[:-1]@petertodd.org


signature.asc
Description: Digital signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Mike Hearn
On Fri, Oct 4, 2013 at 1:53 PM, Peter Todd p...@petertodd.org wrote:

 When I'm reviewing multiple commit pull-requests and want to see every
 change made, I always either click on the Files Changed tab on github,
 which collapses every commit into a single diff, or do the equivalent
 with git log.

 Why doesn't that work for you?


The files changed tab definitely works better for reading. In the past
comments I put there have disappeared, but I think that can also be true of
comments put on the individual commit reviews (which is another issue with
github, but it's unrelated to how the commits are presented). So I have
lost trust in doing reviews that way. It does make things easier to read
though.

One advantage of using github is that they're an independent third
 party; we should think carefully about the risks of furthering the
 impression that Bitcoin development is a closed process by moving the
 code review it to a server that we control with explicit review groups.


I guess anyone would be able to sign up and comment.
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Peter Todd
On Fri, Oct 04, 2013 at 01:58:51PM +0200, Arto Bendiken wrote:
 On Fri, Oct 4, 2013 at 1:35 PM, Peter Todd p...@petertodd.org wrote:
  The second caveat is more specific to Bitcoin: people tend to rebase
  their pull-requests over and over again until they are accepted, but
  that also means that code review done earlier doesn't apply to the later
  code pushed. Bitcoin is a particularly high profile, and high profit,
  target for people trying to get malicious code into the codebase.
 
 On that note, this 2003 example of an attempt to backdoor the Linux
 kernel is pertinent:
 
 http://lwn.net/Articles/57135/
 
 The backdoor in question came down to a single missing character,
 easily overlooked by a reviewer if a spotlight hadn't been thrown on
 it for other reasons. Compromising a Bitcoin implementation isn't
 going to be as easy as that, one would hope, but certainly it seems
 only a matter of time until there's an attempt at it.

Exactly.

Ideally code review discussions would be PGP signed and have a mechanism
for someone to sign a commit saying they had in fact reviewed it.
Combined with git's per-commit signature mechanism it'd make it possible
to write a git-pull hook that checked that whatever was being pulled had
some sufficient number of signatures from people whose reviews you
trusted. With such a system you could host code review anywhere safely,
or for that matter, use a completely distributed system.

But that's going to be a long way off. In the meantime github is
probably more trustworthy and competent than anything we ran ourselves,
and we should focus on making sure reviewers eyeballs actually look at
the code that ends up in master.

-- 
'peter'[:-1]@petertodd.org


signature.asc
Description: Digital signature
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Eugen Leitl
On Fri, Oct 04, 2013 at 02:14:19PM +0200, Mike Hearn wrote:

 One advantage of using github is that they're an independent third
  party; we should think carefully about the risks of furthering the
  impression that Bitcoin development is a closed process by moving the
  code review it to a server that we control with explicit review groups.
 
 
 I guess anyone would be able to sign up and comment.

It's a long shot, but have any of you looked into Fossil?

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Arto Bendiken
On Fri, Oct 4, 2013 at 1:35 PM, Peter Todd p...@petertodd.org wrote:
 The second caveat is more specific to Bitcoin: people tend to rebase
 their pull-requests over and over again until they are accepted, but
 that also means that code review done earlier doesn't apply to the later
 code pushed. Bitcoin is a particularly high profile, and high profit,
 target for people trying to get malicious code into the codebase.

On that note, this 2003 example of an attempt to backdoor the Linux
kernel is pertinent:

http://lwn.net/Articles/57135/

The backdoor in question came down to a single missing character,
easily overlooked by a reviewer if a spotlight hadn't been thrown on
it for other reasons. Compromising a Bitcoin implementation isn't
going to be as easy as that, one would hope, but certainly it seems
only a matter of time until there's an attempt at it.

Following these code review discussions with much interest.

-- 
Arto Bendiken | @bendiken | http://ar.to/

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Andy Parkins
On Friday 04 October 2013 13:32:47 you wrote:
  There is more to a git branch than just the overall difference.  Every
  single
  log message and diff is individually valuable.
 
 When the log messages don't accurately describe the contents of the diff,
 it's just misinformation and noise. Everyone starts out by wanting a neat
 collection of easy to understand and review commits, but in practice it's
 extremely hard to always get it.

Then your request should be for better commits, not for just squashing the lot 
into some incoherent blob.

The alternatives under discussion are:

 - Coder produces long chain of commits on feature branch.  Compresses them, 
throwing away any individual and accurate messages into one large diff.  It's 
unlikely you'll get a log message that is as descriptive in the large one if 
you made them throw away the little ones.  Large diff is offered for review.  
Review is of one large diff.

 - Coder produces long chain on commits on feature branch.  Offers them for 
review.  Reviewer only likes to review large diffs, so uses the tools 
available to produce it.

Exactly the same diff is being reviewed, but in one case you're throwing away 
information.  There is no getting that information back ever.

You're also discarding the advantages of individual commits.

 - Merges are considerably harder than rebases.  You have to resolve all the 
conflicts at once with a merge, with a rebase you can resolve them with the 
log message and original isolated diff to help you.

 - Bisect doesn't give as fine-grained an answer.

 I know how to make squashed commits, thanks. I've done LOTS of code review

Excellent.  Don't take it personally -- I only offered it in case you didn't 
know.  Not everyone is familiar with git plumbing.

 in my life. I'm making a point here as one of the few people who goes
 through large pull requests and reviews them line by line. It's hard,

That doesn't make you the only person who does code reviews.  I do plenty of 
reviews here; they're just not bitcoin reviews.  Obviously we're talking about 
bitcoin, so you get to decide in the end.

 partly because github sucks, and partly because reviewing lots of small
 commits sucks.

I'm not suggesting you review lots of small commits anyway.  I can't comment 
on whether github sucks or not -- that's obviously personal preference.  
However, nothing stops you doing reviews on your own local checkout.

 There's nothing that makes a single large commit harder to review. It's the
 same amount of code or strictly less, given the tendency for later commits

That's not true.  There are often lots of small changes that are manifestly 
correct -- let's use string changes as an example -- in the large commit, they 
are just noise.  You want to be able to focus on the hard commits.  However -- 
I am not trying to persuade you to review small commits, I'm trying to 
persuade you not to throw away the small commits, gone forever, merely because 
your preference is to review large commits.

 to change earlier ones. You can easily search the entire change whilst
 reviewing. There are lots of things that make it easier.

Since the large commit is always available, no facilities have been lost.

Personally I work hard in my repositories to make coherent, small, well 
described commits.  If I had gone to that effort for a bitcoin branch only to 
be told to collapse them all and throw away that effort, I'd think I'd been 
wasting my time.



Andy

-- 
Dr Andy Parkins
andypark...@gmail.com

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Code review

2013-10-04 Thread Gavin Andresen
On Fri, Oct 4, 2013 at 8:30 PM, Mike Hearn m...@plan99.net wrote:

 I'd like to make a small request - when submitting large, complex pieces
 of work for review, please either submit it as one giant squashed change,
 or be an absolute fascist about keeping commits logically clean and
 separated.


I'll try harder to be a fascist (it doesn't come naturally to me). HUGE
thanks for taking the time to review the fee changes in detail.

RE: using Review Board:

I'm all for using better tools, if they will actually get used. If a
potential reviewer has to sign up to create a Review Board account or learn
Yet Another Tool, then I think it would be counter-productive:  we'd just
make the pool of reviewers even smaller than it already is.

Are there good examples of other open source software projects successfully
incentivizing review that we can copy?

For example, I'm wondering if maybe for the 0.9 release and onwards the
Thank you section should thank only people who have significantly helped
test or review other people's code.

-- 
--
Gavin Andresen
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development