Re: [freenet-dev] Github and code review

xor Thu, 02 Apr 2015 12:35:31 -0700

DISCLAIMER:
As the one who offended the fred review process, I *do not* participate in this 
discussion to manipulate your decision. It is up to you folks to decide what 
the review standard is.
I will comply to all rules for pull requests :)
If you say I shall squash, I will squash.

Instead, the information below is provided as a service so you have more input 
to use to come to a conclusion, as I feel that some aspects of squashing have 
not been mentioned by anyone else yet.
After I've explained the unmentioned disadvantages of squashing, at text 
position [1] I will also provide an alternative to squashing which you might 
want to discuss.

(Another reason for this mail is: I have the personal desire to excuse myself 
for causing the huge discussion. I'm sorry, seriously. So to have a chance to 
be forgiven, I want to a least explain why I objected so strongly to 
squashing.)

On Sunday, March 29, 2015 02:29:56 PM Bert Massop wrote:
> In short: I'm in favor of squashing when done in a sensible and
> information-preserving way.
> 
[...]
> 
> This is an interesting observation. There appears to be a certain
> hierarchy to which history can be considered useful for a particular
> purpose. IMHO rewriting / losing history is only a bad thing if the
> history actually serves a purpose: this depends on who will be using the
> log in the future, and for what purpose.
> 
> I really don't care about anyone's "Oops, fix XYZ" or "Also fix XYZ at
> ABC" commits (and nor should any other developer or reviewer) as long as
> the relevant code has not yet surfaced. Yes, this provides information
> as to how the code was originally created, but it serves no purpose in
> understanding, reviewing or maintaining the code.
> 
> I consider this kind of commits *noise* if they end up in the master
> tree (or in the initial commit set of a pull request), though they may
> be useful in my own branch during development (e.g. I can revert more
> granularly, read my work log, etc.). (FYI: I consider myself a noisy
> committer, too.) A pull request may require such commits to be added to
> the publicly known code, at that point they do add information — up to
> the point where the pull request is merged, from then on they no longer
> serve a purpose.

Unfortunately, this is short-sighted.

Consider a branch of a game is developed, it is called "add-vehicle-system". 
It contains the initial new (poor) pseudo code:

  class DrumBrake extends Brake { brake() { speed -= 10; } }
  class Bike extends Vehicle { DrumBrake b; void brake() {  b.brake(); }}
  class Hummer extends Car { DrumBrake b; void brake() {  b.brake(); }}
  class Porsche extends Car { DrumBrake b; void brake() {  b.brake(); }}

Now the code isn't merged yet, and some engineer comes along and notices that 
all the vehicles have copy-pasted and by-design poor drum brakes, so he adds 
commits:
  "Add disc brake"
  "Fix Hummer to use disc brake"

The code is now:
  class DrumBrake  extends Brake  { brake() { speed -= 10; } }
  class DiscBrake  extends Brake  { brake() { speed -= 100; } }
  class Bike  extends Vehicle { DrumBrake b; void brake() {  b.brake(); }}
  class Hummer extends Car { DiscBrake b; void brake() {  b.brake(); }}
  class Porsche extends Car { DrumBrake b; void brake() {  b.brake(); }}

Unfortunately, the fastest vehicle - the Porsche - was forgotten about and 
still has the poor drum brakes.

This isn't noticed yet though, the whole "add-vehicle-system" branch is done 
and thus is squashed and merged into a single commit:
"Add vehicle implementation"

Then someday, someone will notice that the Porsche brakes poorly, and look 
into the history of the code to figure out why this is.
Unfortunately, since all "Fix ... to use disc brake" commits are gone, he will 
NOT notice that a "Fix Porsche to use disc brake" commit was lacking.
Thus, he will think: "Well, this must have been intentional. I've also heard 
that in the 80s even some sports cars still had drum brakes. Guess its OK." 
and leave the Porsche with the poor brakes as is.

Now you might say that this is an academically constructed example. Yea well I 
cannot remember a practical one yet, sorry :)
But if you don't accept this, then what *is* the point of preserving *any* Git 
history anyway? The users only want the most recent version, just as you said. 
So we might as well delete *all* history.

But we don't, because the purpose of history is specifically to have *old* 
versions available so people can investigate the historical evolution of code 
to help with debugging.
So by demanding to delete old versions by squashing, you demand to invalidate 
the sole purpose of the history itself :)

Now as said, nevertheless, I don't want to cast a vote here.
I do understand that people are overwhelmed by the amount of code I have 
thrown at them, and I *will* squash pull requests when requested to.
Maybe it will work out just fine. I have only used squashing once, maybe I 
actually like it if I get more used to it. There are for sure change sets 
which are so simple that I can trust myself with squashing them. And I can 
also admit that sometimes I used to think "if it were only possible to make 
one from this two commits" when I didn't know that squashing existed yet :)

But still, there is an alternate way of doing what you want squashing for:
[1] For N commits to be squashed, create 1 sub-branch of your actual feature 
branch. Merge the branch into the feature branch on which you are actually 
working. Use "--no-ff" with git merge to ensure that it always creates a merge 
commit instead of fast-forwarding. In the merge commit message, put something 
like: "The commits on this branch do not have to be reviewed individually, you 
might review the full diff instead. The commits all together do that: ..."

So basically my suggestion is to inject many small branches which have the 
purpose of providing summaries for the reviewer.
Notice that this also emulates the Mercurial feature which Arne advertised:

On 28-03-15 17:13, Arne Babenhauserheide wrote:
> ¹: The evolve extension of Mercurial has the concepts of hidden
>    changesets with rewrite-markers, so the history which is shown by
>    default is clean, but you can always inquire how it came into
>    being.

Also notable: These micro-branches could probably be done using "git rebase" 
*post mortem* (see [2]). I.e. if a pull request is too large, the sender might 
afterwards rewrite history using rebase to inject more such branches, and then 
re-send the pull request with more branches.
Also, you might not only inject branches, but also re-order the commits so 
that the evil "Fix ..." commits happen right after the commit which introduced 
the code which is fixed upon - instead of having "Fix .." commits days worth of 
commits after the commit whose code they fixed. This will also help with 
sticking those "Fix.." commits all together in a branch.

So overall, I would say that:
1) rewriting history to inject branches
2) rewriting history to sort commits
can also provide a high cleanup of pull requests *without* deleting any 
information :)

With squashing, people requested history rewriting anyway, so by the injection 
& sorting we could at least do the rewriting in a way which does *not* delete 
information.

Of course this should be done with the usual precaution: Never rewrite history 
of a pushed branch, i.e. always create a new branch if you rebase.

Greetings & hope we can resolve the whole discussion to fulfill everyone's 
desire,
        xor

[2] If "git rebase" cannot inject branches itself, then it could at least be 
replaced by a large series of manual "git cherry-pick".

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Github and code review

Reply via email to