2012/2/3 Julian Foad <julianf...@btopenworld.com>: > You may well be correct that NFC is never longer than NFD, but that's not the > question. The question is whether NFC may be longer than the current paths > (which are not normalized to normalization form C or to form D). And the > answer is yes it may be longer. See > <http://unicode.org/faq/normalization.html#11>.
Oh, I didn't know that. Thanks for letting me know. I also read all other items in <http://unicode.org/faq/normalization.html#11> and all of <http://www.unicode.org/reports/tr15/> and learned more about normalization. Maybe we should revise the note. http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames > > >> Here I quote from >> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames >> > The proposed internal 'normal form' should be NFC, if only if >> > it were because it's the most compact form of the two: when >> > allocating memory to store a conversion result, it won't be >> > necessary (ever) to allocate more than the size of the input buffer. > > That statement seems to be talking about converting between NFC and NFD, not > from un-normalized to normalized. Yes, indeed. So, we need to normalize input paths before processing. We choose NFC as normalization form. -- )Hiroaki Nakamura) hnaka...@gmail.com