Hi! I'm supposed to send this proposal to the Google Summer of Code machinery and let it be forwarded to the interrested mentor of the Subversion community, in this case Stefan. In the interrest of openess I'm posting it here before sending it off to Google later today. Maybe someone has something they'd like to add?
=========================================================== Git unidiff format extension to 'svn patch' and 'svn diff' =========================================================== Contents ========== Suggested workflow What is the git unidiff format? Parsing the git headers Applying tree changes Applying mode changes Applying binary patches Applying property changes Suggested workflow ---------------------- Here's my project proposal for GSoC 2010. The purpose of the project is pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal with git unidiff extensions. I've tried to point out some of the API changes that are neccessary to show that I have an understanding of what to do. If I'd get accepted I would do things in this order: 1) Rev funcs to allow a use_git_format flag to be passed down to libsvn_diff and create git diff format patches for adds and deletes. Write a copule of tests to verify that we get the intended format. 2) Add the ability to track renames and copies in libsvn_diff. Probably by using some wc funcs for getting the status. My first assumption was that the svn_wc_diff_callbacks4_t vtable would be revved to allow for copied and moved scenarios once we have editor-v2. But Neels was talking about some bigger rewrite where the diff editor would be dropped. Anyway, as goes for the 'git unidiff format' work, I need some way to detect copies and moves. When I have detection, add the git headers for copies and renames and write tests to confirm the right behavior. 3) Determine how the base85 format works and write C-tests to confirm the behavior. Git does it like this: [4] 4) Pass down a flag for allowing or disallowing binary diffs to libsvn_diff. Detect binary files and write the patches. Write tests to confirm the behavior. 5) Allow 'svn patch' to apply git diff formats for adds and deletes. Write tests to confirm the behavior. 6) Allow 'svn patch' to apply git diff formats for moves and copies. 7) Allow 'svn patch' to apply git diff formats for binary patches. I propably need to do some thinking about what state the wc can be in as for obstructed, missing, replaced, unversioned, ignored nodes and so on. 8) Make libsvn_diff able to record modes. Probably we're only interrested in the executable bit and that one can we get from svn:executable. Write tests to confirm the behavior. 9) Allow 'svn patch' to apply mode changes (if we agree that we want that behavior): 10) Decide on a header for dealing with props? Do we need to stay compatible with git and diff? Probably, so we need a header that will be ignored by applications not interrested in svn:properties. 11) Decide on the header format for properties. Implement it in the diff code and write tests for it. 12) Extend the diff parser to deal with property diffs. Write tests. 13) Done. What is the git unidiff format? -------------------------------- The format is thoroughly described in [1] so I'll just recapitulate the use cases for it: 1) Track copies and renames 2) File mode changes 3) Binary patches Creating the git headers ------------------------- A couple of funcs needs to be revved to pass down the neccessary parameters telling libsvn_diff to create a git diff. And we need a way to detect copies and renames. subversion/libsvn_client/diff.c (svn_client_diff5): We need a parameter to tell the diff machinery we want a git diff. (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and deleted nodes but none for copied or moved nodes. Since we don't have editor-v2 we can't get that info from the server so git diffs should only be possible for wc-wc diffs at the moment. At the moment I'll probably check the status of the path that we get in file_added() and record copied-from or moved-from. subversion/libsvn_diff/diff_file.c (svn_diff_file_diff2): Parsing the git headers ------------------------ We have examples of how the parsing should be done from the mercurial source code [2]. (This link was found in the notes document referred above. A big thank you to Augie Fackler for taking the time to write down all the information). subversion/libsvn_diff/parse-diff.c (parse_git_hunk_header): Create this func to be invoked before parse_hunk_header(). Captures oldname, newname, operation and mode. Applying tree changes ----------------------- We already have many different scenarios to handle with nodes beeing obstructed, missing, ignored, unversioned and so on. If we'll track tree changes the number of scenarios will increase. I probably should make some kind of graph to map out the possible scenarios. subversion/libsvn_client/patch.c (install_patched_target): Here we're currently handling deletes, adds and modifications. With the git diff format we can handle copies and moves here too. Applying mode changes ------------------------- Subversion does not allow file permissions to be recorded. I assume it's since it's hard to make those portable between windows fs and non-windows fs [3]. We'll have to make a decision as to whether 'svn patch' and 'svn diff' should be able to deal with applying permissions. As I see it, version control is about tracking file contents, not that kind of userdata but if someone has a good usecase let me hear! From what I understand it, mode changes are mostly used for setting the executable bit but we have svn:executable for that. Hrm, that of course can't be used yet since we can't use property diffs. :-) Applying binary patches ------------------------- subversion/libsvn_client/patch.c (init_patch_target): If the content is binary it will be encoded with base85. A really, really small possibility but the translated stream might translate something encoded as a keyword. Applying property changes -------------------------- Subversion has properties and it would be great if those could be included in patches. We have a diff format for properties that patch(1) (and hopefully the rest of the patch family) ignores, e.g. they can be displayed without beeing interpreted by the parser. We need a header format that tells the parser on what lines in the patch we have the properties. All the action is in: subversion/libsvn_client/diff.c (display_prop_diffs) cheers, Daniel [1] notes/svnpatch/svnpatch-git.txt [2] http://mercurial.selenic.com/hg/hg/file/ac02b43bc08a/mercurial/patch.py#l195 [3] http://pagesperso-orange.fr/b.andre/permissions.html [4] http://git.kernel.org/?p=git/git.git;a=blob;f=base85.c;hb=HEAD