On Sep 25, 2016, at 18:39, Linus Torvalds wrote:

The kernel, these days, is at roughly 5 million objects, and while the
seven hex digits are still often enough for uniqueness (and git will
always add digits *until* it is unique), it's long been at the point
where I tell people to do

   git config --global core.abbrev 12

because even though git will extend the seven hex digits until the
object name is unique, that only reflects the *current* situation in
the repository. With 5 million objects and a very healthy growth rate,
a 7-8 hex digit number that is unique today is not necessarily unique
a month or two from now, and then it gets annoying when a commit
message has a short git ID that is no longer unique when you go back
and try to figure out what went wrong in that commit.

On Sep 25, 2016, at 20:46, Junio C Hamano wrote:

Linus Torvalds <torva...@linux-foundation.org> writes:

I can just keep reminding kernel maintainers and developers to update
their git config, but maybe it would be a good idea to just admit that
the defaults picked in 2005 weren't necessarily the best ones
possible, and those could be bumped up a bit?

I am not quite sure how good any new default would be, though.  Just
like any timeout is not long enough for somebody, growing projects
will eventually hit whatever abbreviation length they start with.

This made me curious what the situation is really like. So I crunched some data.

Using a recent clone of $korg/torvalds/linux:

$ git rev-parse --verify d597639e203
error: short SHA1 d597639e203 is ambiguous.
fatal: Needed a single revision

So the kernel already has 11-character "short" SHA1s that are ambiguous. Is a core.abbrev setting of 12 really good enough?

Here are the stats on the kernel's repository:

Ambiguous length 11 (but not at length 12) info:
  prefixes:       2
                  0 (with 1 or more commit disambiguations)

Ambiguous length 10 (but not at length 11) info:
  prefixes:      12
                  3 (with 1 or more commit disambiguations)
                  0 (with 2 or more commit disambiguations)

Ambiguous length 9 (but not at length 10) info:
  prefixes:     186
                 43 (with 1 or more commit disambiguations)
                  1 (with 2 or more commit disambiguations)
                  0 (with 3 or more disambiguations)

Ambiguous length 8 (but not at length 9) info:
  prefixes:    2723
                651 (with 1 or more commit disambiguations)
                 40 (with 2 or more commit disambiguations)
                  1 (with 3 or more disambiguations)
  maxambig:       3 (there is 1 of them)

Ambiguous length 7 (but not at length 8) info:
  prefixes:   41864
               9842 (with 1 or more commit disambiguations)
                680 (with 2 or more commit disambiguations)
                299 (with 3 or more disambiguations)
  maxambig:       3 (there are 299 of them)

The "maxambig" value is the maximum number of disambiguations for any single prefix at that prefix length. So for prefixes of length 7 there are 299 that disambiguate into 3 objects.

Just out of curiosity, generating stats on the Git repository gives:

Ambiguous length 8 (but not at length 9) info:
  prefixes:       7
                  3 (with 1 or more commit disambiguations)
                  2 (with 2 or more commit disambiguations)
                  0 (with 3 or more disambiguations)

Ambiguous length 7 (but not at length 8) info:
  prefixes:      87
                 36 (with 1 or more commit disambiguations)
                  3 (with 2 or more commit disambiguations)
                  0 (with 3 or more disambiguations)

Running the stats on $github/gitster/git produces some ambiguous length 9 prefixes (one of which contains a commit disambiguation).

--Kyle

Reply via email to