On Tue, Aug 09, 2016 at 03:29:33PM -0700, Junio C Hamano wrote:

> > @@ -1516,6 +1577,13 @@ static void get_object_details(void)
> >                     entry->no_try_delta = 1;
> >     }
> >  
> > +   /*
> > +    * This must happen in a second pass, since we rely on the delta
> > +    * information for the whole list being completed.
> > +    */
> > +   for (i = 0; i < to_pack.nr_objects; i++)
> > +           break_delta_cycles(&to_pack.objects[i]);
> > +
> >     free(sorted_by_offset);
> >  }
> A potential cycle can only come from reusing deltas across packs in
> an unstable order, that happens way before we do the find_delta()
> thing, so this is a good place to have the new call.  While reading
> break_delta_cycles(), I was wondering if what it does is safe under
> multi-threading but there is no need to worry.

It definitely is not multi-threaded safe (and I'm not sure it could
easily be made so). But yeah, it is quick and happens as part of the
get_object_details() look at the objects, which is where we make
decisions about delta reuse.

> The recursiveness of break-delta-cycles is not too bad for the same
> reason why it is OK to recurse in check_delta_limit(), I would guess?

Yes, and for the same reason that it is OK in write_one(); the recursion
is limited by the depth of the delta chains.

> This is not new with this change, but I am not quite sure what in
> the current code prevents us from busting the delta limit for reused
> ones, though.

I think in the current code you are limited by the depth you might find
in a single existing pack (since we never reuse cross-pack deltas).
There is a comment in check_object() that claims:

        * Depth value does not matter - find_deltas() will
        * never consider reused delta as the base object to
        * deltify other objects against, in order to avoid
        * circular deltas.

but I do not recall seeing code to enforce that (but presumably that is
just beacuse it is a corner of the code I have not seen yet).

However, I think with cross-pack deltas, you could have a situation

  pack 1: A -> B -> C
  pack 2: C -> D -> E

and pick A and B from the first pack, and C, D, and E from the second.
Then you end up with:

  A -> B -> C -> D -> E

in the output. IOW, I think the absolute worst case chain is the
max_depth times the number of packs. In practice I'd expect it to be
much smaller.

I'm not sure how much we should be worried about it. We could fill in
the depth values when adding a reused delta, but I don't think we have
the number handy; we'd have to actually walk the chain (though with
delta-base-offset, that is reasonably cheap).

> > I think my preference is to clean up the "hacky" bit of this patch, and
> > then apply the earlier MRU patch on top of it (which takes my repack
> > from 44 minutes to 5 minutes for this particular test set).
> Yup, with something like this to break the delta chain _and_ allow
> an object to go through the usual deltify machinery, I'd say the MRU
> patch is a wonderful thing to have.

Here are cleaned-up patches. The cycle-breaking patch is cleaned up and
has tests. I erred on the side of verbosity in the comments there,
because it literally took me hours to come up with a reliable working
set of commands (and a convincing argument that it's correct). Hopefully
it doesn't put anyone to sleep. :)

The second patch is the same as before, though I tweaked the commit
message a bit, so please replace what is at the tip of

  [1/2]: pack-objects: break delta cycles before delta-search phase
  [2/2]: pack-objects: use mru list when iterating over packs

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to