Hi, A bit of a status update on the wiki article: http://wiki.apache.org/subversion/NonNormalizingUnicodeCompositionAwareness
Received some comments from Daniel, which I have tried to address. Thanks. I have written a bash script which demonstrates the concept of "Alternative 1" with regards to how the local_relpath column is handled by checkout/update. From the wiki: --- This alternative can be simulated using the attached script localrelpath2nfd.sh. This provides a Working Copy equivalent to what a checkout should produce if this alternative was implemented in Subversion itself: svn co ... svn stat #Shows any problematic items as missing and unversioned localrelpath2nfd.sh svn stat #Should be clean apart from misperception that some items are switched --- This script can be used to investigate how other subcommands are affected and determine what needs to be done. It is possible to make commits but updates to normalisation-dependent nodes will fail since this script is not inside the update code. I intend to use this script to take the design to the next level of detail. First, I would like some feedback from people with in-depth knowledge of the WC and preferably get some idea on what the community thinks about the approach. /Thomas Å. On 26 mar 2012, at 04:14, Thomas Åkesson <tho...@akesson.cc> wrote: > Hi, > Sorry about the delay, had a release to sort out... > > I have moved the proposal into the wiki: > http://wiki.apache.org/subversion/NonNormalizingUnicodeCompositionAwareness > > The comments from Julian and Markus have been implemented and I have added > more information to the "Client Changes" section as well as more structure > and TODO-notes. > > I would really appreciate if someone with more insight into WC-NG could > provide input on some of the TODO items (or things that have been completely > overlooked). > > Thanks, > Thomas Å. > > > On 21 feb 2012, at 09:55, Daniel Shahaf wrote: > >> I've granted you write access to the wiki. >> >> Thomas Åkesson wrote on Tue, Feb 14, 2012 at 12:36:23 +0100: >>> Thanks Julian and Markus for providing feedback. >>> >>> I am not commenting below because all the feedback is very good and I will >>> try to address it as best I can in the next iteration. Describing the >>> behaviour changes to the WC is the most challenging since I lack that kind >>> of detailed knowledge. I will instead try to draft the structure of that >>> section to make it easier for someone with that level of detail to assist. >>> >>> Regarding use cases, what can I say... it was towards the end of a long >>> stretch. >>> >>> I think it would help with the upcoming iterations if I could move this >>> "document" into the wiki. If you find that this first draft shows promise, >>> please consider granting edit access in the wiki. My user name is "Thomas >>> Åkesson", which exercises the Unicode awareness of MoinMoin... >>> >>> /Thomas Å. >>> >>> >>> On 14 feb 2012, at 11:25, Julian Foad wrote: >>> >>>> Hi Thomas. It's fantastic that you're taking the trouble to write up this >>>> proposal. That's just what we need. Just a few initial comments below... >>>> >>>> Thomas Åkesson wrote: >>>> >>>>> Context >>>>> === >>>>> >>>>> [...] A unicode string (e.g. a file name) can be represented >>>>> in 2 normalized forms (NFC/NFD) or mixed, i.e. multiple such >>>>> characters where some are composed and others decomposed (rare). >>>> >>>> >>>> What's "rare"? We have to assume that input is in mixed composition in >>>> any system that doesn't explicitly normalize it, which (I think) includes >>>> most operating systems. While it may be rare for any single string to >>>> contain characters in both compositions, it is very common to be >>>> processing a string that *might* have characters in both compositions -- >>>> in other words, that is not guaranteed to be normalized. I think it would >>>> be clearer to drop the "(rare)" and just say "... normalized forms >>>> (NFC/NFD) or mixed (not normalized).". >>>> >>>> >>>>> A minority of file systems (currently Mac OS X HFS+ only) will >>>>> normalize the paths. In the case of HFS+, the path will be >>>>> normalized into NFD and it will even be given back that way when >>>>> listing the filesystem. >>>> >>>> >>>> Drop the word "even"? The statement is not surprising. >>>> >>>> >>>> [...] >>>> >>>>> Similarities to case-sensitivity >>>>> === >>>>> >>>>> - If two Unicode strings differ only by letter case/composition, >>>> >>>> Drop "/composition" -- it's the subject of the following sentence. >>>> >>>>> on some >>>> computer systems they refer to the same file, while on >>>>> other systems >>>> they refer to different files. The same applies >>>>> if two Unicode strings >>>> differ only by composition. >>>> >>>> >>>>> [...] >>>> >>>>> Client Changes >>>>> === >>>>> >>>>> [...] An abstraction between the repository path and the file >>>>> system path can be achieved by ensuring that there is a column >>>>> in wc.db that contains the file system path in exactly the same >>>>> form that the file system gives back. APIs in wc needs to be >>>>> extended to ensure that all interaction with the file system is >>>>> performed with the file system path. >>>> >>>> [...] >>>> >>>> This part seems to be the heart of the whole proposal. You describe the >>>> data that we need, but the behaviour will also need to be described in >>>> detail. Presumably much of the behaviour is boring and obvious (when we >>>> check out a new path and create it on disk, we store the disk path), but >>>> I'm sure there will be some less obvious parts (do we need to find out >>>> what the disk path of an 'excluded' node would be, even though we're not >>>> actually creating it on disk, for example). >>>> >>>> >>>>> Use Cases >>>>> === >>>>> >>>>> This change will only affect use cases which rely on creating >>>>> paths that look like duplicates but use different unicode >>>>> composition. It is highly unlikely anyone is relying on this.. >>>> >>>> >>>> Uh... it sounds like you are saying there are no interesting use cases for >>>> this proposal! No, on the contrary, this proposal also affects checking >>>> out and using a WC on Mac HFS+ where the repository paths were created on >>>> another system and are not in NFD, and it allows that case to work. >>>> That's the more interesting use case, is it not? It's definitely worth >>>> writing out the interesting case in full, including steps like checkout >>>> (or update) that brings in a non-NFD path, create a new file on the Mac, >>>> and commit. >>>> >>>> - Julian >>>> >>> >