Philip Martin wrote on Thu, 13 Jul 2017 21:36 +0100:
> Branko Čibej <br...@apache.org> writes:
> 
> > Whether this actually forces a format bump or not is a different
> > question which I don't know the answer to.
> 
> I think we would have to bump.  The old code could either read the
> pre-delta or the post-delta files, depending on how we decided to name
> things, but not both.  Either way the old code would not be able to read
> all the revision files and the repository would look broken.

If we invent a "second form of revision file distinguished by name
or path", then yes, we would require a format bump, to ensure all
readers know to cope with the situation that the revision file has been
unlinked from the currently-well-known name.  It would also require us
to figure out how to update all codepaths that open a revision file, to
do the correct triple lookup (old name, new name, packed name).

When I said format bump wouldn't be required, I envisioned that the rev
file that contains a PLAIN rep could be replaced by a rev file that
contains a DELTA rep, *if the DELTA rep is shorter*.  A replacement rev
file could be prepared (and atomically renamed into place) that replaces
the PLAIN rep by the shorter DELTA rep, and updates the unexpanded-len
member of the node-rev header.  That would result in some never-read
padding bytes, but FSFS f7's packing operation could regain them.  (If
the number of digits of unexpanded-len changed, the replacement rev file
would need to add some padding to ensure the number of bytes in the
node-rev header — and hence, offsets to the remainder of the file —
don't change.)

Existing readers don't care whether a rep is a DELTA rep or a PLAIN rep;
they just care that it starts at the given byte offset, has "ENDREP\n"
after the given length, the resulting file checksums to the given value.

Now that I write this down I realize that rep-sharing complicates
matters.  The replacement would only be sound if the rep is not the
target of rep-sharing from another revision; that is easily handled by
only adding the rep to rep-cache.db after replacing the PLAIN rep by the
equivalent, shorter DELTA rep.  The remaining problem is what to do if
the rep is shared between two noderevs inside a single revision, but
<handwave>that's solvable</handwave>.

Regarding the recompress-at-pack alternative, I note that we (= the 1.9
release notes) recommend to pack FSFS f7 repositories regularly.

Cheers,

Daniel

Reply via email to