Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
Jeff Kingwrites: > On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote: > >> Jonathan Tan writes: >> >> > It could be argued that in the future, Git might need to distinguish >> > tree_objects from blob_objects - in particular, a user might want >> > rev-list to print the trees but not the blobs. >> >> That was exactly why these bits were originally made to "appear >> independent but in practice nobody sets only one and leaves others >> off". >> >> And it didn't happen in the past 10 years, which tells us that we >> should take this patch. > > I actually have a patch which uses the distinction. It's for > upload-archive doing reachability checks (which seems rather familiar to > what's going on here). OK. Thanks for stopping me ;-)
Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote: > Whenever tree_objects is set to 1 in revision.h's struct rev_info, > blob_objects is likewise set, and vice versa. Combine those two fields > into one. > > Some of the existing code does not handle tree_objects being different > from blob_objects properly. For example, "handle_commit" in revision.c > recurses from an UNINTERESTING tree into its subtree if tree_objects == > 1, completely ignoring blob_objects; it probably should still recurse if > tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should > behave differently depending on blob_objects (controlling the > instantiation and marking of blob objects). This commit resolves the > issue by forbidding tree_objects from being different to blob_objects. Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is set, then tree_objects must also be set". It's the other way around I care more about. > It could be argued that in the future, Git might need to distinguish > tree_objects from blob_objects - in particular, a user might want > rev-list to print the trees but not the blobs. However, this results in > a minor performance savings at best in that objects no longer need to be > instantiated (causing memory allocations and hashtable insertions) - no > disk reads are being done for objects whether blob_objects is set or > not. In a full object-graph traversal, we actually spend a big chunk of our time in hash lookups. My measurements (admittedly from 2013, which I haven't repeated lately) show something like a 20-25% speedup for this case. My only use for it (and the source of those timings) was to compute archive reachability, which nobody seems to care too much about. But I suspect we could speed up your case, too, when we are just computing the reachability of a non-blob. I.e., you should be able to turn on the smallest subset of "commits only", "commits and trees", and "commits, trees, and blobs", based on what the other side has asked for. -Peff
Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote: > Jonathan Tanwrites: > > > It could be argued that in the future, Git might need to distinguish > > tree_objects from blob_objects - in particular, a user might want > > rev-list to print the trees but not the blobs. > > That was exactly why these bits were originally made to "appear > independent but in practice nobody sets only one and leaves others > off". > > And it didn't happen in the past 10 years, which tells us that we > should take this patch. I actually have a patch which uses the distinction. It's for upload-archive doing reachability checks (which seems rather familiar to what's going on here). The whole series (from 2013!) is at: git://github.com/peff/git jk/archive-reachability but the relevant commits are below. I don't think the same logic holds for this case, though, because somebody actually can ask for a single blob. -- >8 -- From: Jeff King Date: Wed, 5 Jun 2013 17:57:02 -0400 Subject: [PATCH] list-objects: optimize "revs->blob_objects = 0" case If we are traversing trees during a "--objects" traversal, we may skip blobs if the "blob_objects" field of rev_info is not set. But we do so as the first thing in process_blob(), only after we have actually created the "struct blob" object, incurring a hash lookup. We can optimize out this no-op call completely. This does not actually affect any current code, as all of the current traversals always set blob_objects when looking at objects, anyway. Signed-off-by: Jeff King --- list-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/list-objects.c b/list-objects.c index f3ca6aafb..58ad69557 100644 --- a/list-objects.c +++ b/list-objects.c @@ -117,7 +117,7 @@ static void process_tree(struct rev_info *revs, process_gitlink(revs, entry.oid->hash, show, base, entry.path, cb_data); - else + else if (revs->blob_objects) process_blob(revs, lookup_blob(entry.oid->hash), show, base, entry.path, -- 2.12.0.359.gd4c8c42e9 -- >8 -- From: Jeff King Date: Wed, 5 Jun 2013 18:02:42 -0400 Subject: [PATCH] archive: ignore blob objects when checking reachability We cannot create an archive from a blob object, so we would not expect anyone to provide one to us. And if they do, we will fail anyway just after the reachability check. We can therefore optimize our reachability check to ignore blobs completely, and not even create a "struct blob" for them. Depending on the repository size and the exact place we find the reachable object in the traversal, this can save 20-25%, a we can avoid many lookups in the object hash. The downside of this is that a blob provided to a remote archive process will fail with "no such object" rather than "object is not a tree" (we could organize the code to retain the old message, but since we no longer know whether the blob is reachable or not, we would potentially be leaking information about the existence of unreachable objects). Signed-off-by: Jeff King --- archive.c | 1 + 1 file changed, 1 insertion(+) diff --git a/archive.c b/archive.c index ef89b2556..489115f9f 100644 --- a/archive.c +++ b/archive.c @@ -383,6 +383,7 @@ static int object_is_reachable(const unsigned char *sha1) save_commit_buffer = 0; init_revisions(, NULL); setup_revisions(ARRAY_SIZE(argv) - 1, argv, , NULL); + data.revs.blob_objects = 0; if (prepare_revision_walk()) return 0; -- 2.12.0.359.gd4c8c42e9
Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
Jonathan Tanwrites: > It could be argued that in the future, Git might need to distinguish > tree_objects from blob_objects - in particular, a user might want > rev-list to print the trees but not the blobs. That was exactly why these bits were originally made to "appear independent but in practice nobody sets only one and leaves others off". And it didn't happen in the past 10 years, which tells us that we should take this patch. Thanks.
[PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
Whenever tree_objects is set to 1 in revision.h's struct rev_info, blob_objects is likewise set, and vice versa. Combine those two fields into one. Some of the existing code does not handle tree_objects being different from blob_objects properly. For example, "handle_commit" in revision.c recurses from an UNINTERESTING tree into its subtree if tree_objects == 1, completely ignoring blob_objects; it probably should still recurse if tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should behave differently depending on blob_objects (controlling the instantiation and marking of blob objects). This commit resolves the issue by forbidding tree_objects from being different to blob_objects. It could be argued that in the future, Git might need to distinguish tree_objects from blob_objects - in particular, a user might want rev-list to print the trees but not the blobs. However, this results in a minor performance savings at best in that objects no longer need to be instantiated (causing memory allocations and hashtable insertions) - no disk reads are being done for objects whether blob_objects is set or not. Signed-off-by: Jonathan Tan--- bisect.c| 2 +- builtin/rev-list.c | 6 +++--- list-objects.c | 4 ++-- pack-bitmap-write.c | 3 +-- pack-bitmap.c | 3 +-- reachable.c | 3 +-- revision.c | 16 ++-- revision.h | 3 +-- 8 files changed, 16 insertions(+), 24 deletions(-) diff --git a/bisect.c b/bisect.c index 8e63c40d2..265b32905 100644 --- a/bisect.c +++ b/bisect.c @@ -634,7 +634,7 @@ static void bisect_common(struct rev_info *revs) { if (prepare_revision_walk(revs)) die("revision walk setup failed"); - if (revs->tree_objects) + if (revs->tree_and_blob_objects) mark_edges_uninteresting(revs, NULL); } diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 0aa93d589..6c2651b31 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -345,7 +345,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) revs.commit_format = CMIT_FMT_RAW; if ((!revs.commits && -(!(revs.tag_objects || revs.tree_objects || revs.blob_objects) && +(!(revs.tag_objects || revs.tree_and_blob_objects) && !revs.pending.nr)) || revs.diff) usage(rev_list_usage); @@ -374,7 +374,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) return 0; } } else if (revs.max_count < 0 && - revs.tag_objects && revs.tree_objects && revs.blob_objects) { + revs.tag_objects && revs.tree_and_blob_objects) { if (!prepare_bitmap_walk()) { traverse_bitmap_commit_list(_object_fast); return 0; @@ -384,7 +384,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (prepare_revision_walk()) die("revision walk setup failed"); - if (revs.tree_objects) + if (revs.tree_and_blob_objects) mark_edges_uninteresting(, show_edge); if (bisect_list) { diff --git a/list-objects.c b/list-objects.c index f3ca6aafb..796957105 100644 --- a/list-objects.c +++ b/list-objects.c @@ -18,7 +18,7 @@ static void process_blob(struct rev_info *revs, struct object *obj = >object; size_t pathlen; - if (!revs->blob_objects) + if (!revs->tree_and_blob_objects) return; if (!obj) die("bad blob object"); @@ -78,7 +78,7 @@ static void process_tree(struct rev_info *revs, all_entries_interesting: entry_not_interesting; int baselen = base->len; - if (!revs->tree_objects) + if (!revs->tree_and_blob_objects) return; if (!obj) die("bad tree object"); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 970559601..5080e276b 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -258,8 +258,7 @@ void bitmap_writer_build(struct packing_data *to_pack) init_revisions(, NULL); revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; + revs.tree_and_blob_objects = 1; revs.no_walk = 0; revs.include_check = should_include; diff --git a/pack-bitmap.c b/pack-bitmap.c index 39bcc1684..445a24e0d 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -951,8 +951,7 @@ void test_bitmap_walk(struct rev_info *revs) die("Commit %s doesn't have an indexed bitmap", oid_to_hex(>oid)); revs->tag_objects = 1; - revs->tree_objects = 1; - revs->blob_objects = 1; + revs->tree_and_blob_objects = 1; result_popcnt = bitmap_popcount(result); diff --git a/reachable.c