On Sat, Apr 18, 2015 at 10:50:18AM -0700, Walter Bright via Digitalmars-d wrote: > On 4/18/2015 4:35 AM, Jacob Carlborg wrote: > >\u0301 is the "combining acute accent" [1]. > > > >[1] http://www.fileformat.info/info/unicode/char/0301/index.htm > > I won't deny what the spec says, but it doesn't make any sense to have > two different representations of eacute, and I don't know why anyone > would use the two code point version.
Well, *somebody* has to convert it to the single code point eacute, whether it's the human (if the keyboard has a single key for it), or the code interpreting keystrokes (the user may have typed it as e + combining acute), or the program that generated the combination, or the program that receives the data. When we don't know provenance of incoming data, we have to assume the worst and run normalization to be sure that we got it right. The two code-point version may also arise from string concatenation, in which case normalization has to be done again (or possibly from the point of concatenation, given the right algorithms). T -- Mediocrity has been pushed to extremes.
