[this email is from last week, and I think most of was responded to in
other parts of the thread, but there were a few loose ends]
On Sat, Mar 16, 2013 at 02:38:12PM +0100, Michael Haggerty wrote:
> >> * Change pack-refs to use the peeled information from ref_entries if it
> >> is available, rather than looking up the references again.
> > I don't know that it matters from a performance standpoint (we don't
> > really care how long pack-refs takes, as long as it is within reason,
> > because it is run as part of a gc). But it would mean that any errors
> > in the file are propagated from one run of pack-refs to the next. I
> > don't know if we want to spend the extra time to be more robust.
> I thought about this argument too. Either way is OK with me. BTW would
> it make sense to build a consistency check of peeled references into fsck?
Yeah, I do think an fsck check makes sense. It should not be expensive,
and if there is a realistic corruption/health check for a repo, it makes
sense to me for it to be part of fsck. I do not recall many incidents of
packed-refs corruption in the history of the list, but this fairly
recent one comes to mind:
On the other hand, if it is just as cheap to rewrite the file as it is
to do the health checks, I wonder if the advice should simply be "run
pack-refs again" (and doing so should always start from scratch, not
trusting the existing version).
> > Yuck. I really wonder if repack_without_ref should literally just be
> > writing out the exact same file, minus the lines for the deleted ref.
> > Is there a reason we need to do any processing at all? I guess the
> > answer is that you want to avoid re-parsing the current file, and just
> > write it out from memory. But don't we need to refresh the memory cache
> > from disk under the lock anyway? That was the pack-refs race that I
> > fixed recently.
> It would certainly be thinkable to rewrite the file textually without
> parsing it again. But I did it this way for two reasons:
> 1. It would be better to have one packed-refs file parser and writer
> rather than two. (I'm working towards unifying the two writers, and
> will continue once I understand their differences.)
I see your point, though I also feel that with the right refactoring,
they could share the parser. And the second writer would be mostly a
pass-through. But much more compelling is reason 2...
> 2. Given how peeled references were added, it seems dangerous to read,
> modify, and write data that might be in a future format that we don't
> entirely understand. For example, suppose a new feature is added to
> mark Junio's favorite tags:
> # pack-refs with: peeled fully-peeled junios-favorites \n
> ce432cac30f98b291be609a0fc974042a2156f55 refs/heads/master
> 83b3dd7151e7eb0e5ac62ee03aca7e6bccaa8d07 refs/tags/evil-dogs
> 7ed863a85a6ce2c4ac4476848310b8f917ab41f9 refs/tags/lolcats
> b0517166ae2ad92f3b17638cbdee0f04b8170d99 refs/tags/nonsense
Hmm. Good point. It is tempting to make a rule like "extensions that
are specific to a ref must come after the ref but before the next ref".
And then the writer could simply drop any lines between the to-delete
ref and the one that follows it.
I think that would work OK in practice, but I am worried that it would
paint us into a corner later on (after all, we do not know what
extensions will be added in the future). The only thing I can think of
that would break is something where a ref or its extension depends on
previous entries. E.g., we start prefix-compressing the ref names to
save space. Of course that would break backwards compatibility horribly
anyway, so it's a no-go. But maybe some extension would want to refer to
a prior ref. I dunno.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html