RE: Conflicting principles

2003-08-14 Thread Kent Karlsson
Anyway, John J, what code are we talking about that has to work from the positions of the combining marks back to the underlying representation? Are you talking about OCR? No, the issue is more how to start from a base form and work forward to encompass the whole series of

Re: Conflicting principles

2003-08-14 Thread Peter Kirk
On 07/08/2003 13:57, John Cowan wrote: Kent Karlsson scripsit: 4) Encode the vowel signs as combining characters, after the base characters they logical follow. Consider them as double [width] combining characters, that happen to have no ink above/below the character they apply to,

Re: Conflicting principles

2003-08-14 Thread Michael Everson
At 01:18 +0200 2003-08-09, Philippe Verdy wrote: Such break in a middle of a multiple width diacritic exist in some notations, and are not considered horrible typography. Just look at musical notations where a upper horizontal parenthesis is used to group some elements [...] Music setting is

RE: Conflicting principles

2003-08-14 Thread Jon Hanna
what code are we talking about that has to work from the positions of the combining marks back to the underlying representation? Such code is not just common and widespread, it is practically ubiquitous. The principle of base characters always coming first are used: Whenever you need to

Re: Conflicting principles

2003-08-14 Thread Kenneth Whistler
John Cowan asked: I would like to ask the old farts^W^Wrespected elders of the UTC which principle they consider more important, abstractly speaking: the principle that combining marks always follow their base characters (a typographical principle), or that text is stored, with a few minor

Re: Conflicting principles

2003-08-14 Thread Peter Kirk
On 06/08/2003 14:04, John Jenkins wrote: Speaking purely as an old fart, I'd say the former. We already break the latter principle in Thai and Lao, and having be prepared to scan either forward or backward from a base character in order to find its combining marks would add overhead to a lot

RE: Conflicting principles

2003-08-14 Thread ekeown
Madison Hi, Only two people asked me what else exists in the complete Hebrew character set, but maybe others care. The significant points here are that there are other pointing systems to be combined with base letters and that there are manuscripts that have TWO pointing systems

Re: Conflicting principles

2003-08-14 Thread Philippe Verdy
On Friday, August 08, 2003 9:16 PM, Peter Kirk [EMAIL PROTECTED] wrote: On 07/08/2003 13:57, John Cowan wrote: ... But an immediate problem comes to mind: what if there is a line break between the two base characters? What if there is a line break between the two characters joined by a

RE: Conflicting principles

2003-08-14 Thread Michael Everson
Ken's point of course is that however bizarre the backing store for Sindarin and English Tengwar modes may be, combining characters per se must follow their base characters no matter what. -- Michael Everson * * Everson Typography * * http://www.evertype.com

Re: Conflicting principles

2003-08-14 Thread Philippe Verdy
On Thursday, August 07, 2003 11:29 PM, Michael Everson [EMAIL PROTECTED] wrote: Ken's point of course is that however bizarre the backing store for Sindarin and English Tengwar modes may be, combining characters per se must follow their base characters no matter what. Even if that breaks the

RE: Conflicting principles

2003-08-14 Thread Kent Karlsson
Collation isn't really based on combining sequences (even though UTS 10 specifies a certain spanning over non-blocking (combining) This is a very ignorant question: where in your public documentation are these issues discussed? ... I still don't understand even what happens with basic

Re: Conflicting principles

2003-08-14 Thread John Cowan
Peter Kirk scripsit: Sure. A line-break like pre- posterous would be encoded in English-mode Tengwar with the e vowel over the p consonant at the beginning of the second line. Well, I'm not sure what Unicode specifies on word breaks with hyphenations, Please disregard the hyphen: it has

Re: Conflicting principles

2003-08-14 Thread John Jenkins
Speaking purely as an old fart, I'd say the former. We already break the latter principle in Thai and Lao, and having be prepared to scan either forward or backward from a base character in order to find its combining marks would add overhead to a lot of code, including existing code. On

Re: Conflicting principles

2003-08-14 Thread Peter Kirk
On 06/08/2003 16:13, Michael Everson wrote: At 15:18 -0700 2003-08-06, Kenneth Whistler wrote: As someone or other said, I believe that hitherto -- *hitherto,* mark you -- [we have] entirely overlooked the existence of, well, scripts that might cause a conflict between these esteemed

Re: Conflicting principles

2003-08-14 Thread John Cowan
Peter Kirk scripsit: What if there is a line break between the two characters joined by a double width combining character? That would be unbelievably atrocious typography. Double-width CCs are a hack, but a useful hack. Creating a factitious double-width CC that is actually only single

RE: Conflicting principles

2003-08-14 Thread Kent Karlsson
And it would starkly illustrate the fact that an appropriate character encoding does not necessarily directly reflect the phonological structure of a language as represented by that script. Not necessarily is the operative word. The question is whether that failure to reflect is

Re: Conflicting principles

2003-08-14 Thread Peter Kirk
On 08/08/2003 13:07, John Cowan wrote: Peter Kirk scripsit: Sure. A line-break like pre- posterous would be encoded in English-mode Tengwar with the e vowel over the p consonant at the beginning of the second line. Well, I'm not sure what Unicode specifies on word breaks with

Re: Conflicting principles

2003-08-12 Thread John Jenkins
On Wednesday, August 6, 2003, at 3:53 PM, Peter Kirk wrote: This answer presupposes that there is a well-defined concept of which base character a combining mark belongs to. That is not always true. The particukar combining mark which precipitated the debate may be situated above the gap

RE: Conflicting principles

2003-08-10 Thread Kent Karlsson
Kent Karlsson scripsit: 4) Encode the vowel signs as combining characters, after the base characters they logical follow. Consider them as double [width] combining characters, that happen to have no ink above/below the character they apply to, but (like double width

Re: Conflicting principles

2003-08-09 Thread Kenneth Whistler
Philippe, Just look at musical notations where a upper horizontal parenthesis is used to group some elements (sorry I don't know how you name it exactly in English or Italian), despite there's a measure break in the middle, which may span to the other musical line: you end up with two parts

Re: Conflicting principles

2003-08-09 Thread Peter Kirk
On 08/08/2003 12:35, John Cowan wrote: Peter Kirk scripsit: What if there is a line break between the two characters joined by a double width combining character? That would be unbelievably atrocious typography. Double-width CCs are a hack, but a useful hack. Creating a factitious

RE: Conflicting principles

2003-08-08 Thread Michael Everson
At 23:07 +0200 2003-08-07, Kent Karlsson wrote: Kent Karlsson scripsit: 4) Encode the vowel signs as combining characters, after the base characters they logical follow. Consider them as double [width] combining characters, that happen to have no ink above/below the character

RE: Conflicting principles

2003-08-08 Thread ekeown
Elaine Keown Madison WI how to start from a base form and work forward to encompass the whole series of characters which need to be treated as one in certain processes, which can include cursor movement, hit testing, display, line breaking, collation, normalization.

Re: Conflicting principles

2003-08-08 Thread Philippe Verdy
On Saturday, August 09, 2003 1:33 AM, Michael Everson [EMAIL PROTECTED] wrote: At 01:18 +0200 2003-08-09, Philippe Verdy wrote: Such break in a middle of a multiple width diacritic exist in some notations, and are not considered horrible typography. Just look at musical notations where a

Re: Conflicting principles

2003-08-08 Thread John Cowan
Kent Karlsson scripsit: 4) Encode the vowel signs as combining characters, after the base characters they logical follow. Consider them as double [width] combining characters, that happen to have no ink above/below the character they apply to, but (like double width combining

Re: Conflicting principles

2003-08-07 Thread Michael Everson
At 15:18 -0700 2003-08-06, Kenneth Whistler wrote: As someone or other said, I believe that hitherto -- *hitherto,* mark you -- [we have] entirely overlooked the existence of, well, scripts that might cause a conflict between these esteemed principles. The reason why the UTC should tackle the

Re: Conflicting principles

2003-08-07 Thread John Cowan
Kenneth Whistler scripsit: Is a right-to-left script encoded in visual order in the backing store or in phonetic (= logical) order? I've always thought this term visual order was productive of nothing but confusion. I realize that there's precedent in the 8859-x RFCs for its use, but

Re: Conflicting principles

2003-08-07 Thread Rick McGowan
John C asked... I would like to ask the old farts^W^Wrespected elders of the UTC which principle they consider more important, abstractly speaking: the principle that combining marks always follow their base characters (a typographical principle), or that text is stored, with a few minor

Re: Conflicting principles

2003-08-07 Thread Peter Kirk
On 06/08/2003 16:12, John Jenkins wrote: On Wednesday, August 6, 2003, at 3:53 PM, Peter Kirk wrote: This answer presupposes that there is a well-defined concept of which base character a combining mark belongs to. That is not always true. The particukar combining mark which precipitated the

Re: Conflicting principles

2003-08-06 Thread Michael Everson
At 16:16 -0400 2003-08-06, John Cowan wrote: I would like to ask the old farts^W^Wrespected elders of the UTC which principle they consider more important, abstractly speaking: the principle that combining marks always follow their base characters (a typographical principle), or that text is