Anyway, John J, what code are we talking about that has to
work from
the positions of the combining marks back to the underlying
representation? Are you talking about OCR?
No, the issue is more how to start from a base form and work
forward to
encompass the whole series of
On 07/08/2003 13:57, John Cowan wrote:
Kent Karlsson scripsit:
4) Encode the vowel signs as combining characters, after
the base characters they logical follow. Consider them as
double [width] combining characters, that happen to
have no ink above/below the character they apply to,
At 01:18 +0200 2003-08-09, Philippe Verdy wrote:
Such break in a middle of a multiple width diacritic exist in some
notations, and are not considered horrible typography. Just look
at musical notations where a upper horizontal parenthesis
is used to group some elements [...]
Music setting is
what code are we talking about that has to work from the
positions of the combining marks back to the underlying representation?
Such code is not just common and widespread, it is practically ubiquitous.
The principle of base characters always coming first are used:
Whenever you need to
John Cowan asked:
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is stored, with a few minor
On 06/08/2003 14:04, John Jenkins wrote:
Speaking purely as an old fart, I'd say the former. We already break
the latter principle in Thai and Lao, and having be prepared to scan
either forward or backward from a base character in order to find its
combining marks would add overhead to a lot
Madison
Hi,
Only two people asked me what else exists
in the complete Hebrew character set, but
maybe others care.
The significant points here are that there are
other pointing systems to be combined with base
letters and that there are manuscripts that have
TWO pointing systems
On Friday, August 08, 2003 9:16 PM, Peter Kirk [EMAIL PROTECTED] wrote:
On 07/08/2003 13:57, John Cowan wrote:
... But an immediate problem comes to mind: what if there is a
line break between the two base characters?
What if there is a line break between the two characters joined by a
Ken's point of course is that however bizarre the backing store for
Sindarin and English Tengwar modes may be, combining characters per
se must follow their base characters no matter what.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On Thursday, August 07, 2003 11:29 PM, Michael Everson [EMAIL PROTECTED] wrote:
Ken's point of course is that however bizarre the backing store for
Sindarin and English Tengwar modes may be, combining characters per
se must follow their base characters no matter what.
Even if that breaks the
Collation isn't really based on combining sequences (even though UTS
10
specifies a certain spanning over non-blocking (combining)
This is a very ignorant question: where in your public documentation
are these issues discussed?
...
I still don't understand even what happens with basic
Peter Kirk scripsit:
Sure. A line-break like pre-
posterous would be encoded in English-mode Tengwar with the e vowel over
the p consonant at the beginning of the second line.
Well, I'm not sure what Unicode specifies on word breaks with
hyphenations,
Please disregard the hyphen: it has
Speaking purely as an old fart, I'd say the former. We already break
the latter principle in Thai and Lao, and having be prepared to scan
either forward or backward from a base character in order to find its
combining marks would add overhead to a lot of code, including existing
code.
On
On 06/08/2003 16:13, Michael Everson wrote:
At 15:18 -0700 2003-08-06, Kenneth Whistler wrote:
As someone or other said, I believe that hitherto -- *hitherto,*
mark
you -- [we have] entirely overlooked the existence of, well, scripts
that might cause a conflict between these esteemed
Peter Kirk scripsit:
What if there is a line break between the two characters joined by a
double width combining character?
That would be unbelievably atrocious typography. Double-width CCs are a
hack, but a useful hack. Creating a factitious double-width CC that is
actually only single
And it would starkly illustrate
the fact that an appropriate character encoding does not
necessarily directly reflect the phonological structure of
a language as represented by that script.
Not necessarily is the operative word. The question is whether that
failure to reflect is
On 08/08/2003 13:07, John Cowan wrote:
Peter Kirk scripsit:
Sure. A line-break like pre-
posterous would be encoded in English-mode Tengwar with the e vowel over
the p consonant at the beginning of the second line.
Well, I'm not sure what Unicode specifies on word breaks with
On Wednesday, August 6, 2003, at 3:53 PM, Peter Kirk wrote:
This answer presupposes that there is a well-defined concept of which
base character a combining mark belongs to. That is not always true.
The particukar combining mark which precipitated the debate may be
situated above the gap
Kent Karlsson scripsit:
4) Encode the vowel signs as combining characters, after
the base characters they logical follow. Consider them as
double [width] combining characters, that happen to
have no ink above/below the character they apply to,
but (like double width
Philippe,
Just look at musical notations where a upper horizontal parenthesis
is used to group some elements (sorry I don't know how you name
it exactly in English or Italian), despite there's a measure break
in the middle, which may span to the other musical line: you end
up with two parts
On 08/08/2003 12:35, John Cowan wrote:
Peter Kirk scripsit:
What if there is a line break between the two characters joined by a
double width combining character?
That would be unbelievably atrocious typography. Double-width CCs are a
hack, but a useful hack. Creating a factitious
At 23:07 +0200 2003-08-07, Kent Karlsson wrote:
Kent Karlsson scripsit:
4) Encode the vowel signs as combining characters, after
the base characters they logical follow. Consider them as
double [width] combining characters, that happen to
have no ink above/below the character
Elaine Keown
Madison WI
how to start from a base form and work forward to
encompass the whole series of characters which need to be treated as
one in certain processes, which can include cursor movement, hit
testing, display, line breaking, collation, normalization.
On Saturday, August 09, 2003 1:33 AM, Michael Everson [EMAIL PROTECTED] wrote:
At 01:18 +0200 2003-08-09, Philippe Verdy wrote:
Such break in a middle of a multiple width diacritic exist in some
notations, and are not considered horrible typography. Just look
at musical notations where a
Kent Karlsson scripsit:
4) Encode the vowel signs as combining characters, after
the base characters they logical follow. Consider them as
double [width] combining characters, that happen to
have no ink above/below the character they apply to,
but (like double width combining
At 15:18 -0700 2003-08-06, Kenneth Whistler wrote:
As someone or other said, I believe that hitherto -- *hitherto,* mark
you -- [we have] entirely overlooked the existence of, well, scripts
that might cause a conflict between these esteemed principles.
The reason why the UTC should tackle the
Kenneth Whistler scripsit:
Is a right-to-left script encoded in visual order in
the backing store or in phonetic (= logical) order?
I've always thought this term visual order was productive of
nothing but confusion. I realize that there's precedent in the
8859-x RFCs for its use, but
John C asked...
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is stored, with a few minor
On 06/08/2003 16:12, John Jenkins wrote:
On Wednesday, August 6, 2003, at 3:53 PM, Peter Kirk wrote:
This answer presupposes that there is a well-defined concept of which
base character a combining mark belongs to. That is not always true.
The particukar combining mark which precipitated the
At 16:16 -0400 2003-08-06, John Cowan wrote:
I would like to ask the old farts^W^Wrespected elders of the UTC
which principle they consider more important, abstractly speaking:
the principle that combining marks always follow their base characters
(a typographical principle), or that text is
30 matches
Mail list logo