I'm looking at rationalizing the way the client handles diffs, which is currently mostly through the "svn_wc_diff_callbacks4_t" interface, with some use made of svn_delta_editor_t. I'll loosely abbreviate symbol names below, for example "diff_callbacks_t" to refer to the former.
In libsvn_client, currently: Diff what? Producer Interface Consumer Normal diff: repo-repo svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks get_diff_editor | | in diff.c; repo-wc svn_ra + svn_wc | | prints diff to wc-wc svn_wc_diff6 | | stdout Summarizing diff: repo-repo svn_ra_do_diff3 |> delta_editor_t |> repos_diff_ | | summarize.c; | | calls sum-cb repo-wc not implemented wc-wc not implemented What's going on here? Why is the summarizing diff output subsystem not simply an alternative consumer to the normal diff? If it used the same interface, we should be able to plug it straight in. First I wondered why the delta_editor_t isn't a suitable interface for diffs, and why the WC defines its own 'callback' type for this. One reason is because we want a symmetric diff, one that provides the full content of both what's added and what's deleted. Although the summarizing diff doesn't need to know about file content or property deletions, it does want to know about file and directory deletions, so that would seem to be a good thing. So why aren't we using the diff_callbacks_t here? The reason seems to be because the summarize code wants to collect up all the changes for a given node and then issue a single notification at 'close file' or 'close directory' time, and the diff_callbacks_t doesn't provide that functionality. It does provide a 'dir_closed' function, but no 'close_file', and simply having a function isn't necessarily enough: it doesn't provide a per-node baton through which the consumer can link the 'open', 'change' and 'close' calls for a single node together. To get around the lack of a per-node baton, I think the consumer could collect the information it wants in a hash, keyed on path, that's global to the whole diff operation. And I'm still checking whether the interface guarantees are sufficient to not need a 'file_closed' function; it may be so. So, my initial aim is to make 'diff --summarize' work through the very same diff callbacks structure, driven from the very same diff callbacks providers, as the normal diff, and so make 'summarize' available from all diff sources. Whether that involves adding per-node batons to the interface or other changes or no changes at all, I'm not sure yet. Next, I'm interested in 'merge' which is the other current consumer of this diff interface. A second reason why the delta_editor_t isn't suitable seems to be that the diff callbacks are tasked with communicating conflicts back to the caller, for use in a merge. So let's look at merge as well. Merge: repo-repo svn_ra_do_diff3 + |> diff_callbacks_t |> merge_callbacks get_diff_editor | | in merge.c; | | edits the wc repo-wc not implemented wc-wc not implemented We might not care to implement 'merge' with a WC-WC or WC-repo source, but at least we have roughly the right infrastructure to be able to do so, which is a sign of a good design. I'll take a look at all these parameters where we communicate the merge status back to the caller: file_opened() => tree_conflicted, skip file_changed() => contentstate, propstate, tree_conflicted file_added() => contentstate, propstate, tree_conflicted file_deleted() => state, tree_conflicted etc. At first sight it looks like there are more of these than I'd expect in a general diff interface. Maybe some of this info would be better communicated a different way, perhaps through a per-node baton. And after that I'm interested in whether 'patch' could theoretically be implemented as a patchfile-to-diff-callbacks parser followed by the standard 'merge' consumer. If we can do things like that, it opens the door to chaining blocks of functionality together in more interesting ways. - Julian