Hi! The Google Summer of Code period has started. Posting about what I'm currently working on as part of implementing the 'git unidiff' format for wc-wc diffs. Perhaps someone sees an obvious solution to any of the three problems I present below.
The callchain: =============== do_diff() svn_wc_diff_wc6() /* Calls svn_wc__db_read_children() and recurses into child dirs but * calls file_diff() on files. */ directory_elements_diff() file_diff() /* Will be called for copies and moves add-part. */ eb->file_changed() /* If called with '--show-copies-as-adds' the add-part will be * reported with this func. */ eb->file_added() /* Will eventually be called for both added, changed and * deleted paths. From here we call into the diff library to * create headers and actual diff content. */ diff_content_changed() Problem 1 ============== We're processing the delete-half instead of the add-half for a copy/move. At present, copy/move add-half is not shown by 'svn diff' unless invoked with '--show-copies-as-adds'. But with the 'git unidiff' format we would only show a diff for the add-part, e.g. we would change the behaviour from reporting the delete-half to reporting the add-half. Suggestion on solutions: 1) Call file_added() for all paths with db_status copied/moved instead of using file_changed(). + Allows us to easily detect what files are adds and the svn_wc_diff_callback already has a copyfrom arg. - We still have to distinguish between copied and moved paths in the callback. 2) Introduce file_copied() and file_moved() callbacks. Those could have a 'delete_half' arg or similar, e.g. we would pass both the add-half and del-half to that function and separate them with an parameter. + Might enhance readability. - We're introducing a concept that may be viewed as inconsistent with the rest of the code. svn handles 'add+del', is it acceptable to introduce copy and move at this level of the code? Further, the diff_file_added() and diff_file_deleted() does very little as is, wouldn't it be better to introduce some extra complexity there instead? 3) ... Problem 2 ============= We need to process the add-half to know the delete-half. The add-half knows about the delete-half through the copyfrom information, but the other way around does not work - there's no 'copyto' information. Since we don't want to present a diff for the delete-half, we need to be able to know that a deleted file actually is part of a delete. Suggestion on solutions: 1) Create a custom svn_wc__db_read_children() func that can has an out parameter callled 'copyto', e.g. the func would traverse the entire wc looking if there is any other paths that has the current path as copyfrom. + We can make the detecting operation less expensive than a file traversal since we have SQL and the db file will already be in memory. (Will involve more reads before we move to one db). - It feels a bit awkvard to fetch 'copyto'. There must have been a design decision to only only make the copyfrom relation unidirected. How does other code parts detect the copyto? 2) Store added and deleted paths as keys in two hashs with the diff contents inside diff_content_changed(). + We don't have to change directory_elements_diff(). - The memory consumption would grow with the number of deleted and added paths. If we choose to save them to tmp-files, we could avoid that but would suffer in performance instead. 3) ... Problem 3 ========== If we do changes in the callchain presented earlier, we would affect the 'url->wc' use case too. directory_elements_diff() is called from close_directory() and close_edit(). We would have git diffs for -r BASE:WORKING but no git diff headers for the revisions preceeding BASE. I'm assuming that it's not a problem since the current 'svn diff' command shows copies for the changes in the wc but not the ones before. Cheers, Daniel