RFC: Improving the 'diff callbacks'

Julian Foad Tue, 02 Aug 2011 06:59:29 -0700

I'm looking at rationalizing the way the client handles diffs, which is
currently mostly through the "svn_wc_diff_callbacks4_t" interface, with
some use made of svn_delta_editor_t.  I'll loosely abbreviate symbol
names below, for example "diff_callbacks_t" to refer to the former.


In libsvn_client, currently:

  Diff what?    Producer            Interface           Consumer

Normal diff:
  repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks
             get_diff_editor   |                   |   in diff.c;
  repo-wc    svn_ra + svn_wc   |                   |   prints diff to
  wc-wc      svn_wc_diff6      |                   |   stdout

Summarizing diff:
  repo-repo  svn_ra_do_diff3   |> delta_editor_t   |> repos_diff_
                               |                   |   summarize.c;
                               |                   |   calls sum-cb
  repo-wc    not implemented
  wc-wc      not implemented

What's going on here?  Why is the summarizing diff output subsystem not
simply an alternative consumer to the normal diff?  If it used the same
interface, we should be able to plug it straight in.

First I wondered why the delta_editor_t isn't a suitable interface for
diffs, and why the WC defines its own 'callback' type for this.  One
reason is because we want a symmetric diff, one that provides the full
content of both what's added and what's deleted.  Although the
summarizing diff doesn't need to know about file content or property
deletions, it does want to know about file and directory deletions, so
that would seem to be a good thing.  So why aren't we using the
diff_callbacks_t here?

The reason seems to be because the summarize code wants to collect up
all the changes for a given node and then issue a single notification at
'close file' or 'close directory' time, and the diff_callbacks_t doesn't
provide that functionality.  It does provide a 'dir_closed' function,
but no 'close_file', and simply having a function isn't necessarily
enough: it doesn't provide a per-node baton through which the consumer
can link the 'open', 'change' and 'close' calls for a single node
together.

To get around the lack of a per-node baton, I think the consumer could
collect the information it wants in a hash, keyed on path, that's global
to the whole diff operation.  And I'm still checking whether the
interface guarantees are sufficient to not need a 'file_closed'
function; it may be so.

So, my initial aim is to make 'diff --summarize' work through the very
same diff callbacks structure, driven from the very same diff callbacks
providers, as the normal diff, and so make 'summarize' available from
all diff sources.  Whether that involves adding per-node batons to the
interface or other changes or no changes at all, I'm not sure yet.

Next, I'm interested in 'merge' which is the other current consumer of
this diff interface.  

A second reason why the delta_editor_t isn't suitable seems to be that
the diff callbacks are tasked with communicating conflicts back to the
caller, for use in a merge.  So let's look at merge as well.

Merge:
  repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> merge_callbacks
             get_diff_editor   |                   |   in merge.c;
                               |                   |   edits the wc
  repo-wc    not implemented
  wc-wc      not implemented

We might not care to implement 'merge' with a WC-WC or WC-repo source,
but at least we have roughly the right infrastructure to be able to do
so, which is a sign of a good design.

I'll take a look at all these parameters where we communicate the merge
status back to the caller:

  file_opened() => tree_conflicted, skip
  file_changed() => contentstate, propstate, tree_conflicted
  file_added() => contentstate, propstate, tree_conflicted
  file_deleted() => state, tree_conflicted
  etc.

At first sight it looks like there are more of these than I'd expect in
a general diff interface.  Maybe some of this info would be better
communicated a different way, perhaps through a per-node baton.

And after that I'm interested in whether 'patch' could theoretically be
implemented as a patchfile-to-diff-callbacks parser followed by the
standard 'merge' consumer.  If we can do things like that, it opens the
door to chaining blocks of functionality together in more interesting
ways.

- Julian

RFC: Improving the 'diff callbacks'

Reply via email to