From: Jeff King <p...@peff.net>

Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt".

Signed-off-by: Jeff King <p...@peff.net>
Signed-off-by: Christian Couder <chrisc...@tuxfamily.org>
---
 Documentation/git-pack-objects.txt |  88 +++++++++++++++++++
 builtin/pack-objects.c             | 130 ++++++++++++++++++++---------
 2 files changed, 177 insertions(+), 41 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index d95b472d16..7b7a36056f 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -289,6 +289,94 @@ Unexpected missing object will raise an error.
 --unpack-unreachable::
        Keep unreachable objects in loose form. This implies `--revs`.
 
+--delta-islands::
+       Restrict delta matches based on "islands". See DELTA ISLANDS
+       below.
+
+
+DELTA ISLANDS
+-------------
+
+When possible, `pack-objects` tries to reuse existing on-disk deltas to
+avoid having to search for new ones on the fly. This is an important
+optimization for serving fetches, because it means the server can avoid
+inflating most objects at all and just send the bytes directly from
+disk. This optimization can't work when an object is stored as a delta
+against a base which the receiver does not have (and which we are not
+already sending). In that case the server "breaks" the delta and has to
+find a new one, which has a high CPU cost. Therefore it's important for
+performance that the set of objects in on-disk delta relationships match
+what a client would fetch.
+
+In a normal repository, this tends to work automatically. The objects
+are mostly reachable from the branches and tags, and that's what clients
+fetch. Any deltas we find on the server are likely to be between objects
+the client has or will have.
+
+But in some repository setups, you may have several related but separate
+groups of ref tips, with clients tending to fetch those groups
+independently. For example, imagine that you are hosting several "forks"
+of a repository in a single shared object store, and letting clients
+view them as separate repositories through `GIT_NAMESPACE` or separate
+repos using the alternates mechanism. A naive repack may find that the
+optimal delta for an object is against a base that is only found in
+another fork. But when a client fetches, they will not have the base
+object, and we'll have to find a new delta on the fly.
+
+A similar situation may exist if you have many refs outside of
+`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
+`refs/pull` or `refs/changes` used by some hosting providers). By
+default, clients fetch only heads and tags, and deltas against objects
+found only in those other groups cannot be sent as-is.
+
+Delta islands solve this problem by allowing you to group your refs into
+distinct "islands". Pack-objects computes which objects are reachable
+from which islands, and refuses to make a delta from an object `A`
+against a base which is not present in all of `A`'s islands. This
+results in slightly larger packs (because we miss some delta
+opportunities), but guarantees that a fetch of one island will not have
+to recompute deltas on the fly due to crossing island boundaries.
+
+Islands are configured via the `pack.island` option, which can be
+specified multiple times. Each value is a left-anchored regular
+expressions matching refnames. For example:
+
+-------------------------------------------
+[pack]
+island = refs/heads/
+island = refs/tags/
+-------------------------------------------
+
+puts heads and tags into an island (whose name is the empty string; see
+below for more on naming). Any refs which do not match those regular
+expressions (e.g., `refs/pull/123`) is not in any island. Any object
+which is reachable only from `refs/pull/` (but not heads or tags) is
+therefore not a candidate to be used as a base for `refs/heads/`.
+
+Refs are grouped into islands based on their "names", and two regexes
+that produce the same name are considered to be in the same island. The
+names are computed from the regexes by concatenating any capture groups
+from the regex (and if there are none, then the name is the empty
+string, as in the above example). This allows you to create arbitrary
+numbers of islands. For example, imagine you store the refs for each
+fork in `refs/virtual/ID`, where `ID` is a numeric identifier. You might
+then configure:
+
+-------------------------------------------
+[pack]
+island = refs/virtual/([0-9]+)/heads/
+island = refs/virtual/([0-9]+)/tags/
+island = refs/virtual/([0-9]+)/(pull)/
+-------------------------------------------
+
+That puts the heads and tags for each fork in their own island (named
+"1234" or similar), and the pull refs for each go into their own
+"1234-pull".
+
+Note that we pick a single island for each regex to go into, using "last
+one wins" ordering (which allows repo-specific config to take precedence
+over user-wide config, and so forth).
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..3bd41d8b05 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -24,6 +24,7 @@
 #include "streaming.h"
 #include "thread-utils.h"
 #include "pack-bitmap.h"
+#include "delta-islands.h"
 #include "reachable.h"
 #include "sha1-array.h"
 #include "argv-array.h"
@@ -59,6 +60,7 @@ static struct packing_data to_pack;
 
 static struct pack_idx_entry **written_list;
 static uint32_t nr_result, nr_written, nr_seen;
+static uint32_t write_layer;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -93,6 +95,8 @@ static uint16_t write_bitmap_options;
 
 static int exclude_promisor_objects;
 
+static int use_delta_islands;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -607,7 +611,7 @@ static inline void add_to_write_order(struct object_entry 
**wo,
                               unsigned int *endp,
                               struct object_entry *e)
 {
-       if (e->filled)
+       if (e->filled || e->layer != write_layer)
                return;
        wo[(*endp)++] = e;
        e->filled = 1;
@@ -669,6 +673,7 @@ static void add_family_to_write_order(struct object_entry 
**wo,
 
 static struct object_entry **compute_write_order(void)
 {
+       uint32_t max_layers = 1;
        unsigned int i, wo_end, last_untagged;
 
        struct object_entry **wo;
@@ -700,51 +705,58 @@ static struct object_entry **compute_write_order(void)
         */
        for_each_tag_ref(mark_tagged, NULL);
 
-       /*
-        * Give the objects in the original recency order until
-        * we see a tagged tip.
-        */
+       if (use_delta_islands)
+               max_layers = compute_pack_layers(&to_pack);
+
        ALLOC_ARRAY(wo, to_pack.nr_objects);
-       for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
-               if (objects[i].tagged)
-                       break;
-               add_to_write_order(wo, &wo_end, &objects[i]);
-       }
-       last_untagged = i;
+       wo_end = 0;
 
-       /*
-        * Then fill all the tagged tips.
-        */
-       for (; i < to_pack.nr_objects; i++) {
-               if (objects[i].tagged)
+       for (; write_layer < max_layers; ++write_layer) {
+               /*
+                * Give the objects in the original recency order until
+                * we see a tagged tip.
+                */
+               for (i = 0; i < to_pack.nr_objects; i++) {
+                       if (objects[i].tagged)
+                               break;
                        add_to_write_order(wo, &wo_end, &objects[i]);
-       }
+               }
+               last_untagged = i;
 
-       /*
-        * And then all remaining commits and tags.
-        */
-       for (i = last_untagged; i < to_pack.nr_objects; i++) {
-               if (oe_type(&objects[i]) != OBJ_COMMIT &&
-                   oe_type(&objects[i]) != OBJ_TAG)
-                       continue;
-               add_to_write_order(wo, &wo_end, &objects[i]);
-       }
+               /*
+                * Then fill all the tagged tips.
+                */
+               for (; i < to_pack.nr_objects; i++) {
+                       if (objects[i].tagged)
+                               add_to_write_order(wo, &wo_end, &objects[i]);
+               }
 
-       /*
-        * And then all the trees.
-        */
-       for (i = last_untagged; i < to_pack.nr_objects; i++) {
-               if (oe_type(&objects[i]) != OBJ_TREE)
-                       continue;
-               add_to_write_order(wo, &wo_end, &objects[i]);
-       }
+               /*
+                * And then all remaining commits and tags.
+                */
+               for (i = last_untagged; i < to_pack.nr_objects; i++) {
+                       if (oe_type(&objects[i]) != OBJ_COMMIT &&
+                           oe_type(&objects[i]) != OBJ_TAG)
+                               continue;
+                       add_to_write_order(wo, &wo_end, &objects[i]);
+               }
 
-       /*
-        * Finally all the rest in really tight order
-        */
-       for (i = last_untagged; i < to_pack.nr_objects; i++) {
-               if (!objects[i].filled)
-                       add_family_to_write_order(wo, &wo_end, &objects[i]);
+               /*
+                * And then all the trees.
+                */
+               for (i = last_untagged; i < to_pack.nr_objects; i++) {
+                       if (oe_type(&objects[i]) != OBJ_TREE)
+                               continue;
+                       add_to_write_order(wo, &wo_end, &objects[i]);
+               }
+
+               /*
+                * Finally all the rest in really tight order
+                */
+               for (i = last_untagged; i < to_pack.nr_objects; i++) {
+                       if (!objects[i].filled && objects[i].layer == 
write_layer)
+                               add_family_to_write_order(wo, &wo_end, 
&objects[i]);
+               }
        }
 
        if (wo_end != to_pack.nr_objects)
@@ -1504,7 +1516,8 @@ static void check_object(struct object_entry *entry)
                        break;
                }
 
-               if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, 
NULL))) {
+               if (base_ref && (base_entry = packlist_find(&to_pack, base_ref, 
NULL)) &&
+                   in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
                        /*
                         * If base_ref was set above that means we wish to
                         * reuse delta data, and we even found that base
@@ -1820,6 +1833,11 @@ static int type_size_sort(const void *_a, const void *_b)
                return -1;
        if (a->preferred_base < b->preferred_base)
                return 1;
+       if (use_delta_islands) {
+               int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid);
+               if (island_cmp)
+                       return island_cmp;
+       }
        if (a_size > b_size)
                return -1;
        if (a_size < b_size)
@@ -1968,6 +1986,9 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
        if (trg_size < src_size / 32)
                return 0;
 
+       if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid))
+               return 0;
+
        /* Load data if not already done */
        if (!trg->data) {
                read_lock();
@@ -2506,6 +2527,9 @@ static void prepare_pack(int window, int depth)
        uint32_t i, nr_deltas;
        unsigned n;
 
+       if (use_delta_islands)
+               resolve_tree_islands(progress, &to_pack);
+
        get_object_details();
 
        /*
@@ -2669,6 +2693,9 @@ static void show_commit(struct commit *commit, void *data)
 
        if (write_bitmap_index)
                index_commit_for_bitmap(commit);
+
+       if (use_delta_islands)
+               propagate_island_marks(commit);
 }
 
 static void show_object(struct object *obj, const char *name, void *data)
@@ -2676,6 +2703,19 @@ static void show_object(struct object *obj, const char 
*name, void *data)
        add_preferred_base_object(name);
        add_object_entry(&obj->oid, obj->type, name, 0);
        obj->flags |= OBJECT_ADDED;
+
+       if (use_delta_islands) {
+               const char *p;
+               unsigned depth = 0;
+               struct object_entry *ent;
+
+               for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
+                       depth++;
+
+               ent = packlist_find(&to_pack, obj->oid.hash, NULL);
+               if (ent && depth > ent->tree_depth)
+                       ent->tree_depth = depth;
+       }
 }
 
 static void show_object__ma_allow_any(struct object *obj, const char *name, 
void *data)
@@ -3003,6 +3043,9 @@ static void get_object_list(int ac, const char **av)
        if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
                return;
 
+       if (use_delta_islands)
+               load_delta_islands();
+
        if (prepare_revision_walk(&revs))
                die("revision walk setup failed");
        mark_edges_uninteresting(&revs, show_edge);
@@ -3182,6 +3225,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
                  option_parse_missing_action },
                OPT_BOOL(0, "exclude-promisor-objects", 
&exclude_promisor_objects,
                         N_("do not pack objects in promisor packfiles")),
+               OPT_BOOL(0, "delta-islands", &use_delta_islands,
+                        N_("enable islands for delta compression")),
                OPT_END(),
        };
 
@@ -3308,6 +3353,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
        if (pack_to_stdout || !rev_list_all)
                write_bitmap_index = 0;
 
+       if (use_delta_islands)
+               argv_array_push(&rp, "--topo-order");
+
        if (progress && all_progress_implied)
                progress = 2;
 
-- 
2.18.0.237.gffdb1dbdaa

Reply via email to