On 04/15/2013 07:39 PM, Junio C Hamano wrote: > Michael Haggerty <mhag...@alum.mit.edu> writes: > >> Stop emitting an error message for dangling packed references found >> when deleting another packed reference. See the previous commit for a >> longer explanation of the issue. >> >> Change repack_without_ref_fn() to silently ignore dangling packed >> references. >> >> Signed-off-by: Michael Haggerty <mhag...@alum.mit.edu> > > Somehow this feels as if it is sweeping the problem under the rug. > > If you ignore a ref for which a loose ref exists when you update a > packed refs file, whether the stale "packed" one points at an object > that is still there or an object that has been garbage collected, > you would not even have to check if the "ref" resolves to object or > anything like that, no? > > Am I missing something? > > This one feels iffy in the otherwise pleasant-to-read series.
The usual situation when this code would be triggered would be that the packed reference is overridden by a loose ref and points at an object that has been garbage collected. In that case it is definitely incorrect to emit an error message. But the fact that we don't explicitly verify that there is an overriding loose reference means that it is possible that the failure to resolve the packed ref comes from some kind of repository corruption, and you are correct that such a problem would be swept under the rug by my change. I've been trying to minimize the extra work that repack_without_ref() needs to do to write peeled references, to avoid stretching out the delay that can now occur when deleting a reference. Thus I was trying to save a check of loose references during this operation. But I guess I agree that a little bit more caution would be prudent. I can think of a few ways to avoid sweeping possible indications of repo corruption under the rug, in order of increasing run-time: 1. If a packed ref's SHA-1 cannot be resolved, write the packed ref to the new packed-refs file anyway with SHA-1 but without a peeled value. This would avoid having to check the loose references and avoid erasing possible evidence of corruption, but would delay an actual check for corruption until a later time. It would be a quick fix, effectively kicking the can down the road instead of sweeping it under the rug. Minor pitfall: a reference that is listed without a peeled value in a fully-peeled pack-refs file tells future readers that the corresponding SHA-1 *cannot* be peeled. IF the named object would somehow reappear in the repository (e.g., via a fetch) and IF the object is peelable and IF there is in fact no loose ref overriding the packed ref, then the final result would be that one form of corruption (reference points to non-existent object) would be converted to another form (reference falsely believed to be non-peelable). I think this is an acceptable risk because (a) it would only happen in an unlikely series of events in a repo that was already corrupt, and (b) because falsely believing a reference to be non-peelable wouldn't have terrible consequences. 2. Whenever a packed reference cannot be resolved to an object, verify that there is indeed a loose reference overriding it; if not, emit an error and in either case omit the packed ref from the output. 3. Check for an overriding loose reference *before* trying to peel a packed reference, and omit any overridden loose references from the output packed-refs file. This would be close to running "pack-refs --no-prune" without the "is_tag_ref" test and with reuse of available peeled values. This approach would tidy up the packed-refs file a bit more than (2) because it would cause the deletion of more overridden packed refs, but only as part of first peeling them, which should only happen once in a repo, and only if the first peeling occurs within repack_without_ref() as opposed to an explicit pack_refs(). So it's a negligible improvement over (2). 4. Further along the "correctness" spectrum, one could check for overriding loose references *every* time the packed-refs file is rewritten by repack_without_ref(), even for references whose peeled values are already known. But this would add overhead to every deletion of a packed reference, which is probably not justified. I'm worried that implementing 2-4 would introduce new race conditions of the type that Peff discovered recently, unless we fix the locking policy first (which is also on my TODO list). So my suggestion is to implement 1 now and implement 2 sometime in the future. Opinions? Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html