Ernest Cline
Fri, 16 Apr 2004 07:08:24 -0700
> [Original Message] > From: Antoine Leca <[EMAIL PROTECTED]> > > ... it is vastly more easy to keep the obvious unification, rather than > trying to distort it and trying to make a conditional mapping, if > Mathematics, · => U+00B7, if Catalan, · => U+2027, if NoSeQue, · => > some_other_random_middle_dot, etc. Unlike hyphenation rules (where the > mapping might very well be · => U+2027, by the way), which are pretty easy > to pinpoint, tagging Catalan in bulk text is clearly not a easy task. Even > when considering the fairly restrictive rules for it to occur (requiring > NFC):
I don't see that as being any worse than the set of HYPHEN_MINUS, HYPHEN, MINUS SIGN, etc., which depending upon your taste in such matters could be seen as an example of what to do or what not to do. That said, let me switch the topic to something almost completely different. Given the nature of U+0140 (and U+013F) when hyphenated, might it not be a good idea to assign these two characters their own Line Break class for the Line Breaking Algorithm of UAX #14? These two characters if I understand the comments correctly, always provide a line breaking opportunity after them, but if that line break opportunity is taken, the dot must disappear, so an implementation that is not prepared to remove the dot should ignore the opportunity.