Philippe Verdy vamped: > > > For example I would not be shocked if a text using it was rendered with > > > a monospaced font, where the base line of the character cell shows > > > multiple tiny dots, that create a contiguous dotted line when multiple > > > U+2024 characters (one per display cell) are used to indent the text in > > > columns. > > > > > > Of course with proportional fonts this character would display at least > > > (and preferably) a single dot. Any use of this character that assumes > > > it is a symbol consisting in a single dot aligned on the baseline seems > > > to abuse the semantic of this character, which is not a punctuation, > > > but really a styling character used instead of an "invisible" thin > > > space. > >
And Jim Allan asked: > > Where is this behavior indicated by Unicode specifications? > > > > Such behavior appears to me to be a non-standard extension on Unicode, > > interpreting what Unicode classes as a General Puncutation character as > > instead a Formatting Character. > > But I don't see how conforming aplications could assume this semantic > > for the character when reading in plain text Unicode or writing plain > > text Unicode. > > > > What then is U+2025 TWO DOT LEADER? And then Philippe Verdy continued to improvise: > For me this one is a punctuation, commonly used to designate > a separator between bounds of intervals like [0..1] (it is > generally surrounded by a thin space on both sides with strict > typography). It should not be used to create arbitry lengths > of leaders. What he is talking about here is generally represented by the sequence <U+002E, U+002E>, in other words, just two full stops, as in the example given "[0..1]". Typographical rules then deal with any issues of spacing around or between the dots. > > The three dot leader is also a punctuation (normally not > prefixed by any space, but followed by a large space like > for the full dot). It should not be used to create arbitry > lengths of leaders. This is a reference to U+2026 HORIZONTAL ELLIPSIS, and Philippe is correct that that should not be used to create arbitrary leaders. > The one-dot leader should have no other purpose than to be > used in sequences of arbitrary length. This statement is only very accidentally true. Explanation below. > The whole sequence of single-dots leaders like this forms a > single token with the semantic of a word separator, where the > number of displayed dots is not really relevant for the reader > of text whatever is rendering style or fonts. But this is absolutely false, as Jim Allan suggested. U+2024 ONE DOT LEADER is a graphic character, whose glyph consists of a small baseline dot, and whose General Category is Po (Other Punctuation). It cannot be used conformantly as if it were a formatting control standing in for a rich text representation of a leader object (e.g. in a generated Table of Contents in a Word or FrameMaker document). > I just think that this 1-dot leader is used as a way to transcode > within a single string what was initially a tabulation decorated > by some markup system, False. Now, here is the true story of U+2024. It is a compatibility character, introduced for compatibility with XCCS (Xerox Character Code Standard) 1980, where it was mapped to the coded character 356B/242B (0xEEA2), described as "Leader, one-dot on an en body". Its use in XCCS would have been to create leaders manually, by lining up a sequence of "one-dot on an en body" to create a sufficiently long leader. Its rationale in Unicode would be to either map to data created in XCCS or to manually lay out text using a comparable mechanism, but for which one wished to distinguish the "dots" thus used from U+002E FULL STOP. U+2025 TWO DOT LEADER is also an XCCS compatibility character. It corresponds to XCCS 356B/243B (0xEEA3) "Leader, two-dot on an en body" *and* to 041B/105B (0x2145) "Leader, two-dot on an em body". The difference in width was considered a formatting distinction and was unified away in creating the U+2025 encoded character, as preserving that distinction in plain text was considered unnecessary by the Xerox representative to the committee at the time. U+2026 HORIZONTAL ELLIPSIS maps to the ellipsis seen in a number of legacy character encodings, including the Macintosh character sets, but also maps to an XCCS character: 041B/104B (0x2144) "Leader, three-dot on an em body". All *three* of these characters should be considered compatibility characters. Indeed, they formally *are* "compatibility decomposable characters" (Chapter 3, Definition D21), since they each have compatibility decompositions to one or more U+002E FULL STOP characters. That last fact should be taken as a hint that for most purposes, manual leaders should just be sequences of FULL STOP characters (as you will see, for instance in the plain text representations of Internet Drafts or RFCs, for example). But in any rich text format, leaders are styled formatting objects (somewhat similar to tabulations, as Philippe suggested), but that does *not* make U+2024 a format character (LEADER PLACEHOLDER, or whatever). It is exactly what it claims to be: a compatibility character, punctuation, with a single baseline dot as its glyph. --Ken

