On adapting the "commit" data flow to work with the new pristine text store and SHA-1 checksums.
OBSERVATIONS ============ The call graph during a commit is (adapted from notes/wc-ng/use-of-tmp-text-base-path): svn_client_commit4() |^[T] | | wc_to_repos_copy() |^[M] | | | |^ | | svn_client__do_commit() | | [N] |^ | | |^ | | |^ | | LIBSVN_CLIENT |^ | | ...................................................................... LIBSVN_WC |^ | |^ svn_wc_queue_committed3() |^ |^ | svn_wc_transmit_text_deltas3() | [N] |^ | svn_wc__internal_transmit_text_deltas() | [N] |^ | |^ | |^ svn_wc_process_committed_queue2() |^ | |^ svn_wc__process_committed_internal() |^ | |^ process_committed_leaf() |^ |^ |v |^ |^ |v svn_wc__text_base_path(tmp=TRUE) |v |v (Here the new text base is installed from the given path.) The calling sequence is: svn_client__do_commit() calls svn_wc_transmit_text_deltas3(), once per modified file; then svn_client_commit4() calls svn_wc_queue_committed3(), once per significant node, to build a queue; then svn_wc_process_committed_queue2(), just once, passing that queue. svn_wc_transmit_text_deltas3() does several things: - determine the new text base content by translating the working file to repository-normal form; - transmit deltas of that against the old text base; - verify the recorded checksum of the old text base; - optionally, store the new text base in a temporary file. These purposes could be separated out a bit, although I don't think it's particularly important to do so right now: - get a stream of the working text translated to RNF: svn_wc_translated_stream2() - get a stream of the old text base: svn_wc_get_pristine_contents2() - transmit deltas, given two readable streams: editor->apply_textdelta() f/b svn_txdelta_run() - write a stream to a new text base file: (We have old and new private APIs for this; I don't think we have public ones.) The tricky bit here is: after writing a new text-base file, that file's path (old way) or SHA1 checksum (new way) needs to be communicated to svn_wc_process_committed_queue(). The path isn't currently being communicated, it's being re-derived. The obvious way (1): Pass the list of new-text-base checksums on to svn_wc_process_committed_queue(). That is relatively straightforward. I need to check whether the Queue is already having a separate entry for each and every modified file, and make sure it does. Another possible way (2): If, in svn_wc_transmit_text_deltas3() or just afterwards, we were to store the checksum in the ACTUAL_NODE table, in a checksum field that represents the "Repo-Normal-Form of the ACTUAL text which is currently being committed", then at commit post-processing time we could get this checksum from the DB, knowing only the working file's path, and write it into the BASE_NODE table. (We wouldn't rely on the working file remaining untouched on disk, because we've stored a copy of this checksummed text into the pristine store at the same time.) What are the pros and cons? See backward compatibility, below. THE NEW WAY =========== This is a straightforward way to modify the new API. Note: svn_wc_transmit_text_deltas3(), svn_wc_queue_committed3() and svn_wc_process_committed_queue2() are already new in 1.7; their predecessors must be kept working for backward compatibility. svn_wc_transmit_text_deltas3() shall: - write the new text base into the pristine store rather than a particular path; - return the SHA-1 checksum of the new text base; - no longer return the old "tempfile" and "md5_digest" outputs. svn_wc_queue_committed3() shall: - take the SHA-1 checksum of every modified file *and every new file* (instead of an MD-5 checksum). svn_wc_process_committed_queue2() shall: - use the SHA-1 checksums found in the queue. COMPATIBILITY ============= We need to keep the old WC interface working: svn_wc_transmit_text_deltas2(&tempfile, &md5_digest, ...) svn_wc_queue_committed2(queue, path, ..., md5_checksum) svn_wc_process_committed_queue(queue, ...) How? I can't see a way to communicate the SHA-1 checksum to svn_wc_process_committed_queue() via the queue, but I can think of the following ways. (1) An advantage of the method that stores the new checksum in the ACTUAL_NODE table is that the backward-compatible old API can look there to find the new text base: svn_wc_transmit_text_deltas2() shall, if TEMPFILE is non-null: - store the new text base (with its checksums) in the pristine store; - store the new text base's SHA-1 checksum in ACTUAL_NODE; - return the new text base's MD5 digest; - set *TEMPFILE to some path that it's safe for the caller to attempt to delete, but that is not otherwise meaningful; svn_wc_process_committed_queue() shall: - look in ACTUAL_NODE to find each file's new text base SHA-1; - "install" the new text base by simply writing that SHA-1 to BASE_NODE. (2) If we use the simpler method (where the new API puts the new text in the pristine store and passes only its SHA-1 checksum along), the only solution I can find for the compatibility API is to keep working the old way: put the temporary text base file at the old specially derivable path, and then find it there within svn_wc_queue_committed2() or svn_wc_process_committed_queue(): svn_wc_transmit_text_deltas2() shall, if TEMPFILE is non-null: - store the new text base at the special derived path; - set *TEMPFILE to that path; - return the new text base's MD5 digest. svn_wc_process_committed_queue() shall: - find the new text base at the special derived path; - calculate its SHA-1 checksum; - store it (with its checksums) in the pristine store; - put that SHA-1 checksum in ACTUAL_NODE. Comments please. Is either of those ways to be preferred? One last thought: I haven't described here where an added file gets its text base when it is committed. Of course its SHA-1 checksum needs to be calculated and passed on too, similar to a modified file but not using svn_wc_transmit_text_deltas3(). - Julian