On Fri, Jul 29, 2016 at 08:02:51AM -0700, Junio C Hamano wrote:
> > So it's possible that the resulting pack
> > is not as small as it could be (i.e., we break the chain with a base
> > object, but it's possible if we looked that we could have broken the
> > chain by making a delta against an existing base object). So I wonder if
> > it's possible to detect this case earlier, during the "can we reuse this
> > delta" bits of check_object().
> I'd let the issue simmer in my mind a bit for now, as I do not
> think of a neat trick offhand.
Sorry, I let this go a bit longer than intended. Here's where I'm at
with my thinking.
To recap the issue for those just joining us, the series in question
changes the order in which pack-objects will look at the existing
packfiles (in order to make the counting step faster when you have many
packs). This can introduce cycles in reused deltas because we might find
"A" as a delta of "B" in one pack, but "B" as a delta of "A" in another
pack. Whereas the current code, with a static pack ordering, will always
find such an "A" and "B" in the same pack, so as long as that pack does
not have a cycle, our set of reused deltas won't either.
The current code does detect the cycle, but not until the write phase
(at which point we issue a warning, throw out the delta, and output the
Here's a list of approaches I think we can use to fix this:
1. squelch the warning and ignore it. The downside here, besides not
warning the user about true in-pack cycles, is that we have no
opportunity to actually find a new delta (because we realize the
problem only after the delta-compression phase).
My test repository is a bad packing of all of the forks of
torvalds/linux, with 3600 packs. I'm happy to share it if anybody
wants to see it, but note that it is 11GB.
The space overhead of the resulting pack in this case is ~3.2%
(versus a pack generated by the original code, using the static
pack order). Which is really not that bad, but I'm sure there are
more pathological cases (i.e., there were on the order of hundreds
or maybe thousands of cycles that needed broken, out of about 15
million total objects; but one could imagine the worst case as
2. break the cycles in check_object(), when we consider whether we can
reuse the delta.
The obvious advantage here (over breaking it in the writing phase)
is that we can then feed the object into the normal
The question is: how?
2a. One way to do so is to provide a total ordering over the packs.
I.e., if "A" is a delta of "B", then allow the delta to be
reused iff the pack containing "A" sorts before or equal to the
pack containing "B". The actual ordering doesn't matter for
purposes of the cycle-breaking, as long as its consistent.
I implemented this using the mtime of the packs (breaking ties
by their names), as that would give us results close to the
Unsurprisingly, this adds a bit of CPU time (because we have
more delta objects to search), though it's dwarfed by the wins
you get from the sped-up counting phase (in my tests, 20
seconds of extra delta search for 39 minutes of "counting
But it doesn't actually help the resulting pack size. It's
still about 3.2% overhead (it's actually just slightly worse
than case 1).
I think what happens is that we throw out a lot of deltas, even
ones that _aren't_ cycles, but just happened to be found in
packs that are "backwards" in the total order. And then we have
to do a delta search on them, but of course that can't always
find a good match.
I also tried two other variants here. The pack order for the
cycle-breaking step shouldn't matter, so I wondered what would
happen if I reversed it. It does indeed produce different
results. The resulting pack is slightly better, at 2.6%
overhead. But we spend even more time looking for deltas (IOW,
we threw out more of them).
I also tried bumping the pack window size, under the theory
that maybe we just aren't finding great deltas for the ones we
throw out. But it didn't seem to make much difference.
So I like the simplicity of this idea (which, by the way, was
not mine, but came from an off-list discussion with Michael).
But I think it's a dead-end, because it throws away too many
deltas that _could_ be reused.
2b. Another option is to do real cycle analysis, and break the
This is tricky to do in check_object() because we are actually
filling out the delta pointers as we go. So I think we would
have to make two passes: one to come up with a tentative list
of deltas to reuse, and then one to check them for cycles.
If done right, the cycle check should be linear in the number
of objects (it's basically a depth-first search). In fact, I
think it would look a lot like what write_object() is doing.
We'd just be moving the logic _before_ the delta-compression
phase, instead of after.
This is the one approach I've considered but have not yet
implemented. So no numbers.
3. Pick a consistent pack order up front, and stop changing it
The question, of course, is which order.
For my test repo, this is actually pretty easy. There's one
gigantic pack with 15 million objects in it, and then 3600 small
packs with a handful of objects from pushes. The giant pack is the
oldest, so the normal "reverse chronological" pack order makes
counting effectively O(m*n), because the majority of the objects
are found in the very last pack.
So an obvious thing to try is just reversing it. And indeed, the
counting is quite fast then (similar to the MRU numbers). Though of
course one can come up with a case where the objects are more
evenly split across the packs, and it would not help (you can come
up with pathological cases for MRU, too, but they're much less
likely in practice, because it's really exploiting locality).
But I was surprised to find that the resulting pack is even worse,
at 4.5% overhead.
I can't really explain that, and I'm starting to come to the
conclusion that there's a fair bit of luck and black magic involved
in delta reuse. I would not be at all surprised to find a test case
where reversing the order actually _improved_ things.
It may be that the numbers I saw in my (2a) tests were not because
we broke deltas and couldn't find replacements, but simply because
we get more and better deltas by looking in the smaller, newer
So that's where I'm at. Mostly puzzled, and wondering if any of my
experiments were even valid or showing anything interesting, and not
just somewhat random artifacts of the deltas that happen to be in this
particular test case. Worse, it may be that looking in the small packs
_is_ objectively better, and we're losing that in any kind of
pack-ordering changes (which is an inherent part of my speed-up strategy
for the counting phase).
I've yet to implement 2b, true cycle-detection before delta-compression.
But I have a feeling it will somehow show that my resulting pack is
about 3% worse. :-/
 So obviously one option is to figure out a way to speed up the
counting phase without changing the pack order. The only way I can
think of is to build a master object index, where each entry has all
of the packs that a given object can be found in, in their correct
That would be fairly easy to implement as a hash table, but:
- it would require a fair bit of memory; the pack .idx for this
repo is 500MB, and we usually get to just mmap it. OTOH, the
biggest part of that is the sha1s, so we could possibly just
reference them by pointer into the mmap'd data. And it's not
like you don't need much more than 500MB to hold the list of
objects to pack.
- it's inherently O(nr_of_objects_in_repo) in CPU and memory to
build the index, whereas some operations are
O(nr_of_objects_to_be_packed). In this case it's probably OK (we
do pay for extra entries for each of the duplicates, but it's
largely on the order of 15 million objects). But it's a big
expense if you're just packing a few of the objects.
So I dunno. I really like the MRU approach if we can salvage it.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html