On 7/16/12 8:57 PM, "Johan Corveleyn" <jcor...@gmail.com> wrote:
>On Mon, Jul 16, 2012 at 3:33 PM, C. Michael Pilato <cmpil...@collab.net> >wrote: >> On 07/16/2012 08:11 AM, Bert Huijben wrote: >>> As we couldn't think of a usage of the content I would suggest that we >>>just >>> always set the property to '*', just like how we handle svn:executable, >>> svn:needs-lock, etc. This would also make sure that merges of this >>>property >>> won't need special handling. >> >> +1. Let's get the mechanics for recognizing branch roots in place >>first. >> We can worry about additional policy matters later when we have a better >> idea what our users might require. > >+1 to some support for branch identification. But I'm not sure if such >a simple property will provide that in a good way. OTOH I don't have a >better suggestion now. > >First, a couple of use cases I have in mind if branch-roots can be >identified: I've noticed this thread has a lot of focus on "how to identify branch roots" but not so much on what to do with that information. What are the specific use cases we want to address? In my case, I had a client with a *huge* Subversion deployment; nearly a thousand production repos, tens of thousands of sub-teams, hundreds of gigabytes of data, etc. They had an awful experience with merging in particular -- one of the most prominent teams had to essentially stop development for a few weeks whilst the SCM admins tried to unbreak their mergeinfo (amongst other things). 50+ developers at 10% productivity x 2+ weeks = expensive. The-CIO-has-heard-about-this-and-is-super-pissed expensive. My remit was simple: prevent this sort of problem from ever happening again. From that, I extrapolated I needed to block a good 20-30 different types of commits that contributed to "repo entropy". For example: - TagDirectoryCreatedManually - BranchDirectoryCreatedManually - BranchRenamedToTrunk - TrunkRenamedToBranch - TrunkRenamedToTag - BranchRenamedToTag - BranchRenamedOutsideRootBaseDir - TagSubtreePathRemoved - RenameAffectsMultipleRoots - UncleanRenameAffectsMultipleRoots - MultipleRootsCopied - TagCopied - UncleanCopy - FileRemovedFromTag - CopyKnownRootSubtreeToValidAbsRootPath - MixedRootsNotClarifiedByExternals - CopyKnownRootToIncorrectlyNamedRootPath - CopyKnownRootSubtreeToIncorrectlyNamedRootPath - UnknownPathRenamedToIncorrectlyNamedNewRootPath - RenamedKnownRootToIncorrectlyNamedRootPath - MixedChangeTypesInMultiRootCommit - CopyKnownRootToKnownRootSubtree (Full list of events currently blocked by Enversion: http://people.apache.org/~trent/events.py) Other requirements: - Minimal administrative overhead. Requiring administrators to manually specify branch roots (or trying to use regexes when each repo had an entirely different layout) does not scale when you're dealing with thousands of repositories. Ditto for requiring dump/ load dances. - 100% accuracy for branch identification. - No false negatives. Confidence in Subversion was at an all time low -- many teams were threatening to ditch it and set up their own P4/git repo. If anything was introduced that made the user experience more painful, there would be anarchy. With all of those requirements set in stone, I came up with the evn:roots revprop approach. Which, I'm happy to report, has been chugging along in production at this client's site for about 18 months now. Enversion will now block some... $(cat events.py | grep '^class ' | wc -l)... 100 different types of commits that contribute to "repo entropy". For the record, here's an outline of Enversion's evn:roots approach. It took a few failed attempts before I came up with the design below... but, as I mentioned, it's been in production for 18+ months on just under a thousand repos and has met all the original requirements, so I'm pretty happy with it. 1. Analyze the repository via `evnadmin analyze <repo>`. This processes rev 0 to HEAD sequentially. 2. "In the beginning, there was /trunk". I'm amazed how much mileage I got out of this idiom. Essentially, the only way to 'create' a root from scratch is to `svn mkdir .*/trunk`. 3. Once a .*/trunk mkdir is detected, an evn:roots entry is added for it in the revprop it was created in. For example, after analyzing r1 of the trac repo: % svn pg evn:roots --revprop -r1 `gru trac` {'/trunk/': {'copies': {}, 'created': 1, 'creation_method': 'created'}} 4. When processing the next revision, the roots from the previous revprop are inherited in a simplified format: % svn pg evn:roots --revprop -r2 `gru trac` {'/trunk/': {'created': 1 }} i.e. the root name and the revision it was created in. 5. Analysis of each subsequent revision always inherits the previous revision's roots (in the simplified format). 6. With an up-to-date, definitive list of repository roots on hand each time we process a new revision, we can easily detect if a revision affects a root. A root can be affected in the following ways: - Copied (directly and indirectly). - Renamed (directly and indirectly). - Replaced (directly and indirectly). - Removed (directly and indirectly). During analysis, we process the revision and update the roots regardless of the action. However, once analysis is complete and the hooks are enabled, we can block all the crazy stuff. This is an important point -- even though the end goal is to eventually block dodgy commits, we have to process such commits during analysis and update evn:roots accordingly. You have no idea how complicated this actually is. There are about seven extreme corner cases that Enversion still bombs out on -- commits that I never would have thought even remotely possible until I saw them in the wild. 7. Once we detect a root is affected, evn:roots is updated accordingly. In trac@r175, a new tag is created. Specifically, trunk@175 is copied to /tags/trac-0.5-rc1. That results in two changes. First, the evn:roots of r175's revprop includes the new root: % svn pg evn:roots --revprop -r175 `gru trac` {'/tags/trac-0.5-rc1/': {'copied_from': ('/trunk/', 174), 'copies': {}, 'created': 175, 'creation_method': 'copied'}, '/trunk/': {'created': 1}} Second, we record that trunk was copied. This sort of metadata is always stored back in the revprop where the root was created, in this case, r1: % svn pg evn:roots --revprop -r1 `gru trac` {'/trunk/': {'copies': {174: [('/tags/trac-0.5-rc1/', 175)]}, 'created': 1, 'creation_method': 'created'}} As analysis continues, the entry for /trunk/ in r1's evn:roots gets continually updated with relevant actions that affect it. If a root is detected as being removed (directly or indirectly) during analysis, a note is made in the originating evn:roots revprop that it was deleted (with reference to the rev it was removed in, and the type of removal (i.e. removed directly, removed indirectly due to ancestory path being removed, etc), and the root will no longer be inherited in future evn:roots. As for Enversion, the good news is that it's free, open source, Apache 2.0 licensed and available on github. The bad news is that it's poorly documented at the moment and the installation is a bit fiddly: % git clone https://github.com/tpn/enversion.git Cloning into 'enversion'... remote: Counting objects: 56, done. remote: Compressing objects: 100% (44/44), done. remote: Total 56 (delta 6), reused 55 (delta 5) Unpacking objects: 100% (56/56), done. % export PYTHONPATH=$PYTHONPATH:`pwd`/enversion % export PATH=$PATH:`pwd`/enversion/bin % evnadmin Type 'evnadmin <subcommand> help' for help on a specific subcommand. Available subcommands: analyze create disable-remote-debug (drd) doctest [snip] If you get that far, you'll be able to create a new Enversion-enabled repository, or analyze an existing one. See (the incredibly terse) https://github.com/tpn/enversion/blob/master/doc/quick-start.rst for a few more hints. FWIW, Snakebite sucks up 110% of my time at the moment, so I'm having to neglect Enversion a bit. I'll be ramping back up on it soon, though. I'd still love to hear from people having a play with it. It's production ready from a functionality perspective, but definitely alpha quality from a installer/documentation/docstrings/unit-tests perspective. Although that's primarily a result of only being funded for 20 days rather than negligence on my part ;-) Regards, Trent.