On Sat, Feb 25, 2017 at 06:50:50PM +0000, brian m. carlson wrote:

> > As long as the reader can tell from the format of object names
> > stored in the "new object format" object from what era is being
> > referred to in some way [*1*], we can name new objects with only new
> > hash, I would think.  "new refers only to new" that stratifies
> > objects into older and newer may make things simpler, but I am not
> > convinced yet that it would give our users a smooth enough
> > transition path (but I am open to be educated and pursuaded the
> > other way).
> 
> I would simply use multihash[0] for this purpose.  New-style objects
> serialize data in multihash format, so it's immediately obvious what
> hash we're referring to.  That makes future transitions less
> problematic.
> 
> [0] https://github.com/multiformats/multihash

I looked at that earlier, because I think it's a reasonable idea for
future-proofing. The first byte is a "varint", but I couldn't find where
they defined that format.

The closest I could find is:

  https://github.com/multiformats/unsigned-varint

whose README says:

  This unsigned varint (VARiable INTeger) format is for the use in all
  the multiformats.

    - We have not yet decided on a format yet. When we do, this readme
      will be updated.

    - We have time. All multiformats are far from requiring this varint.

which is not exactly confidence inspiring. They also put the length at
the front of the hash. That's probably convenient if you're parsing an
unknown set of hashes, but I'm not sure it's helpful inside Git objects.
And there's an incentive to minimize header data at the front of a hash,
because every byte is one more byte that every single hash will collide
over, and people will have to type when passing hashes to "git show",
etc.

I'd almost rather use something _really_ verbose like

  sha256:1234abcd...

in all of the objects. And then when we get an unadorned hash from the
user, we guess it's sha256 (or whatever), and fallback to treating it as
a sha1.

Using a syntactically-obvious name like that also solves one other
problem: there are sha1 hashes whose first bytes will encode as a "this
is sha256" multihash, creating some ambiguity.

-Peff

Reply via email to