Re: Comments on 'notes/unicode-composition-for-filenames'

Stefan Sperling Tue, 22 Feb 2011 10:57:20 -0800

On Tue, Feb 22, 2011 at 07:41:12PM +0100, Branko Čibej wrote:
> On 22.02.2011 18:17, Julian Foad wrote:
> >> Proposed Support Library
> >> ========================
> >>
> >>    Assumptions
> >>    -----------
> >>
> >>    The main assumption is that we'll keep using APR for character set
> > s/character set/character encoding/.
> >
> >>    conversion, meaning that the recoding solution to choose would not
> >>    need to provide any other functionality than recoding.
> > s/recoding/converting between NFD and NFC UTF8 encodings/.
> 
> Actually -- you have to go all the way and support complete
> normalization, even if your normalization targets are only NFC and NFD.
> That's because there isn't a sane way to detect whether a string is
> normalized or not -- "sane" in the sense that it should take about as
> long to discover that as to just normalize it.


To put it differently, the only way to figure out whether a given
UTF-8 sequence is valid (or, by extension, uses NFC and/or NFD)
is to parse the entire sequence.

Re: Comments on 'notes/unicode-composition-for-filenames'

Reply via email to