On Fri, Sep 30, 2016 at 10:54:16AM -0700, Linus Torvalds wrote:
> On Fri, Sep 30, 2016 at 1:06 AM, Jeff King <[email protected]> wrote:
> >
> > I agree that this deals with the performance concerns by caching the
> > default_abbrev_len and starting there. I still think it's unnecessarily
> > invasive to touch get_short_sha1() at all, which is otherwise only a
> > reading function.
>
> So the reason that d oesn't work is that the "disambiguate_state" data
> where we keep the number of objects is only visible within
> get_short_sha1().
>
> So outside that function, you don't have any sane way to figure out
> how many objects. So then you have to do the extra counting function..
Right. I think you should do the extra counting function. It's a few
more lines, but the design is way less tangled.
> > So IMHO, the best combination is the init_default_abbrev() you posted in
> > [1], but initialized at the top of find_unique_abbrev(). And cached
> > there, obviously, in a similar way.
>
> That's certainly possible, but I'm really not happy with how the
> counting function looks. And nobody actually stood up to say "yeah,
> that gets alternate loose objects right" or "if you have tons of those
> alternate loose objects you have other issues anyway". I think
> somebody would have to "own" that counting function, the advantage of
> just putting it into disambiguate_state is that we just get the
> counting for free..
I don't think you _need_ get the alternate loose objects right. In fact,
I don't think you need to care about loose objects at all. For the
scales we're talking about, they're a rounding error. I would have done
it like this:
diff --git a/sha1_file.c b/sha1_file.c
index 65deaf9..1845502 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1382,6 +1382,32 @@ static void prepare_packed_git_one(char *objdir, int
local)
strbuf_release(&path);
}
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+ static unsigned long count;
+ if (!approximate_object_count_valid) {
+ struct packed_git *p;
+
+ prepare_packed_git();
+ count = 0;
+ for (p = packed_git; p; p = p->next) {
+ if (open_pack_index(p))
+ continue;
+ count += p->num_objects;
+ }
+ }
+ return count;
+}
+
static void *get_next_packed_git(const void *p)
{
return ((const struct packed_git *)p)->next;
@@ -1456,6 +1482,7 @@ void prepare_packed_git(void)
void reprepare_packed_git(void)
{
+ approximate_object_count_valid = 0;
prepare_packed_git_run_once = 0;
prepare_packed_git();
}