From: "Kenneth Whistler" <[EMAIL PROTECTED]>
> That last fact should be taken as a hint that for most
> purposes, manual leaders should just be sequences of FULL STOP
> characters (as you will see, for instance in the plain text
> representations of Internet Drafts or RFCs, for example).
> But in any rich text format, leaders are styled formatting objects
> (somewhat similar to tabulations, as Philippe suggested), but
> that does *not* make U+2024 a format character (LEADER
> PLACEHOLDER, or whatever). It is exactly what it claims to
> be: a compatibility character, punctuation, with a single
> baseline dot as its glyph.

What surprizes me the most in the Unicode spec is that it both says that its purpose 
is to create arbitrary length of leaders (you say that the spacing statement in the 
Xerox name was not considered important by Xerox, so how many leaders would be needed 
to fit a en space with the Unicode designation?). Why then do you insist that it 
represents one dot ? You also seem to insist o the "compatibility" decomposition which 
is normally removing an important semantic (else it would be canonical).
All this seems like creating contradictions.

Also it would be the only punctuation sign whose number of occurences is not relevant 
(in dotted lines used as leaders), as the final presentation of the text will need to 
compensate for font metrics differences in order to produce the correct effect (also 
because the size of the dots where removed from the Unicode designation.)

I do no agree wih your argument that says that it is like a full dot to be used in 
limited applications (if Unicode wanted to remove the spacing, it was to generalize is 
use as an abstract character, not to reenforce its mapping to an approximate full dot.)

Compatibility decompositions are not intended to represent exactly the same semantics 
between the "composed" character and the core base characters in the decompositions. I 
think that compatibility decompositions are only acceptable fallbacks when the initial 
character is not supported, but they do not represent the same abstract characters. At 
least it was true before the decomposition stability "pact", but it is less clear now 
as roundtrip convertibility with some encodings is favored face to exact character 
abstraction.

I never heard about the Xerox CCS before, but there's a large legacy usage of the 
ellipsis as a single unbreakable character (and the two dots for the notation of 
interval bounds are also unbreakable). The single dot leader looks like a way to fill 
the gap, only because two-dot three-dots ellipsis did not allow, in most fonts and 
applications, to create a regular leader, using smaller dots than the one used for the 
regular full stop punctuation.

The fact that it was unified with XCCS (with some compromizes accepted by Xerox) 
clearly demonstrates that the Xerox design was not the main focus:
- Who knows XCCS and use it ? Very few people.
- Who uses leaders ? Every publisher and author of long documents that do not want to 
see irregularily spaced leaders, or a dotted grid instead of a true dotted horizontal 
line.

Leaders are visual helpers for the eye of readers, they have absolutely no punctuation 
or symbolic semantic (unlike the two-dots symbol or the ellipsis). The fact that it 
was categorized as a punctuation is probably an initial error that can' be corrected 
and that comes from the classification of its approximative fallback "compatibility 
decomposition".

I do not see it as a compatility character needed for roundtrip conversions with 
legacy sets (even if XCSS was mapped this way after some compromizes). Pure roundtrip 
conversions respect the initial design of the legacy set from which a character is 
mapped.

So you seem to mix the very distinct concept of compatibility characters and 
compatibility decompositions:

- compatibility characters are for the initial mapping from an important legacy 
encoding with full roundtrip, and the exact semantic is preserved in this mapping to 
Unicode. The usage of these Unicode codepoints is discouraged out of this legacy usage.

- characters that have compatiblity decompositions are intended as guides for 
acceptable fallback characters that will not create too confusive interpretation by 
readers, but the exact semantic is not preserved with their compatibility 
decomposition. Their usage is not discouraged but instead favored by Unicode which 
adds important semantics in the "composed" character.


Reply via email to