On Wed, 2010-04-21, Julian Foad wrote: > Greg Stein wrote: > > On Wed, Apr 21, 2010 at 05:09, Philip Martin <philip.mar...@wandisco.com> > > wrote: > > > Julian Foad <julian.f...@wandisco.com> writes: > > > > > >> COMPATIBILITY > > >> ============= > > >> > > >> We need to keep the old WC interface working: > > >> > > >> svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...) > > >> svn_wc_queue_committed2(queue, path, ..., md5_checksum) > > >> svn_wc_process_committed_queue(queue, ...) > > >> > > >> How? I can't see a way to communicate the SHA-1 checksum to > > >> svn_wc_process_committed_queue() via the queue, but I can think of the > > >> following ways. > > > > > > There is an access baton in the old interface, it's opaque and can be > > > made to store anything. It could contain a hash of filename=>SHA-1. > > Thanks, Philip - I hadn't thought of that. I'll bear it in mind. It > could be useful for other things too. > > > Presumably, transmit_text_deltas2 is a wrapper around deltas3. Thus, > > deltas2 has the SHA1 value from that inner call. > > > > Putting something into *TEMPFILE is optional, so we can skip that. The > > MD5 result from deltas2 can be fetched from the PRISTINE table, given > > the SHA1 key. > > Yup, transmit_text_deltas2() can easily know and return the MD-5. > > > queue_committed2 can use the MD5 value and key into PRISTINE (we > > should have an index on PRISTINE.md5_checksum) to find the SHA1. > > I wondered about looking up the pristine text from its MD-5. Certainly > possible (preferably via an index for speed). > > At first I had a slight concern about the remote possibility of MD-5 > collisions. I now think we can alleviate the concern by checking if a > new pristine text ever has an MD-5 that's already recorded in the > pristine store against a different SHA-1. If that ever happens, we can > issue a warning or error, the resolution of which is "upgrade to 1.7+, > which no longer relies on MD-5 uniqueness". > > The bit I'm not sure about is whether the MD-5 of *every* new text base > in the commit is actually passed through the queue. I'll go and test > whether it is - it doesn't look like it, from the way I read the code.
BTW I confirmed that, yesterday: it's true with the current code. However, the APIs have been through several revisions, and the oldest ones - svn_wc_process_committed() and svn_wc_process_committed2() - don't even communicate the MD5 checksum on to the post-commit step. In IRC discussion with Greg we decided a probable strategy for dealing with the old APIs is to make them rely on a fixed, deterministic, tmp path, like they always have done. Instead of the WC-1 scheme which gave a path like <dir/somedir/.svn/tmp/text-base/FOO> the WC-NG code will provide a path that's safe to be in a single .svn dir at the root of a WC, such as <.svn/tmp/text-base/dir/somedir/FOO> or maybe encoding <dir/somedir/foo> into a single path component if that's better than dealing with arbitrary levels of subdirs. - Julian > > This seems pretty straight-forward, unless I've missed something. > > > > (note that the pristine store would have this "extra" pristine, > > unreferenced by any other table; that could get garbage-cleaned by a > > separate process... BUT: there is an admin lock present during a > > commit, so we'd simply avoid GC'ing the PRISTINE table/on-disk) > > Yup, I'm happy that we can manage the GC properly, in one way or > another. > > Thanks for the feedback. > > > - Julian > >