Re: NNBSP (was: A last missing link for interoperable representation)

Marcel Schneider via Unicode Thu, 17 Jan 2019 05:39:45 -0800

On 17/01/2019 12:21, Philippe Verdy via Unicode wrote:


[quoted mail]

But the French "espace fine insécable" was requested long long before Mongolian 
was discussed for encodinc in the UCS.


Then we should be able to read its encoding proposal in the UTC document 
registry, but Google Search seems unable to retrieve it, so there is a big risk 
that no such proposal does exist, despite the registry goes back until 1990.

The only thing that searches have brought up to me is that the part of UAX #14 
that I’ve quoted in the parent thread has been added by a Unicode Technical 
Director not mentioned in the author field, and that he did it on request from 
two gentlemen whose first names only are cited. I’m sure their full names are 
Martin J. Dürst and Patrick Andries, but I may be wrong.

I apologize for the comment I’ve made in my e‑mail. Still it would be good to 
learn why the French use of NNBSP is sort of taken with a grain of salt, while 
all involved parties were knowing that this NNBSP was (as it still is) the only 
Unicode character ever encoded able to represent the so-long-asked-for “espace 
fine insécable.”

There is also another question I’m asking since a while: Why the character U+2008 
PUNCTUATION SPACE wasn’t given the line break property value "GL" like its 
sibling U+2007 FIGURE SPACE?

This addition to UAX #14 is dated as soon as “2007-08-08”. Why was the Core 
Specification not updated in sync, but only a 7 years later? And was Unicode 
aware that this whitespace is hated by the industry to such an extent that a 
major vendor denied support in a major font at a major release of a major OS?

Or did they wait in vain that Martin and Patrick come knocking at their door to 
beg for font support?


Regards,

Marcel

The problem is that the initial rush for French was made in a period where 
Unicode and ISO were competing and not in sync, so no agreement could be found, 
until there was a decision to merge the efforts. Tge early rush was in ISO 
still not using any character model but a glyph model, with little desire to 
support multiple whitespaces; on the Unicode side, there was initially no 
desire to encode all the languages and scripts, focusing initially only on 
trying to unify the existing vendor character sets which were already 
implemented by a limited set of proprietary vendor implementations (notably 
IBM, Microsoft, HP, Digital) plus a few of the registered chrsets in IANA 
including the existing ISO 8859-*, GBK, and some national standard or de facto 
standards (Russia, Thailand, Japan, Korea).
This early rush did not involve typographers (well there was Adobe at this time but still 
using another unrelated technology). Font standards were still not existing and were 
competing in incompatible ways, all was a mess at that time, so publishers were still 
required to use proprietary software solutions, with very low interoperability (at that 
time the only "standard" was PostScript, not needing any character encoding at 
all, but only encoding glyphs!)

If publishers had been involded, they would have revealed that they all needed various whitespaces for correct typography (i.e. layout). Typographs themselves did not care about whitespaces because they had no value for them (no glyph to sell). Adobe's publishing software were then completely proprietary (jsut like Microsoft and others like Lotus, WordPerfect...). Years ago I was working for the French press, and they absolutely required us to manage the [FINE] for use in newspapers, classified ads, articles, guides, phone books, dictionnaries. It was even mandatory to enter these [FINE] in the composed text and they trained their typists or ads sellers to use it (that character was not "sold" in classified ads, it was necessary for correct layout, notably in narrow columns, not using it confused the readers (notably for the ":" colon): it had to be non-breaking, non-expanding by justification, narrower than digits and even narrower than standard non-justified whitespace,and was consistently used as a decimal grouping separator.

But at that time the most common OSes did not support it natively because there was no
vendor charset supporting it (and in fact most OSes were still unable to render
proportional fonts everywhere and were frequently limited to 8-bit encodings (DOS,
Windows, Unix(es), and even Linux at its early start). So intermediate solution was
needed. Us chose not to use at all the non-breakable thin space because in English it was
not needed for basic Latin, but also because of the huge prevalence of 7-bit ASCII for
everything (but including its own national symbol for the "$", competing with
other ISO 646 variants). There were tons of legacy applications developed ince decenials
that did not support anything else and interoperability in US was available ony with
ASCII, everything else was unreliable.

If you remember the early years when the Internet started to develop outside US, you remember the
nightmare of non-interoperable 8-bit charsets and the famous "mojibake" we saw
everywhere. Then the competition between ISO and Unicode lasted too long. But it was considered
"too late" for French to change anything (and Windows used in so many places by som many
users promoted the use of the Windows-1252 charset (which had a few updates before it was frozen
definitely: there was no place for NNBSP in it). Typographers and publishers were upset: to use the
NNBSP they still needed to use proprietary *document* encodings. The W3C did not help much too (it
was long to finally adopt the UCS as a mandatory component for HTML, before that it depended only
on the old IANA charset database promoting only the work of vendors and a few ISO standards).

France itself wanted to keep its own national variant of ISO 646 (inherited
from telegraphic systems), but it was finally abandoned: everybody was already
using windows 1252 or ISO 8859-1 (even early Unix adopters which used a
preliminary version made by Digital/DEC, then promoted by X11), or otherwise
used Adobe proprietary encodings. Unix itself had no standard (so many
different variants including with other OSes for industrial or accounting
systems, made notably by IBM,, which created so many variants, almost one for
each submarket, multiple ones in the same country, each time split into
mutliple variants between those based on ASCII, and those based on EBCDIC...)

The truth is that publishers were forgotten, because their commercial market
was much narrower: each publisher then used its own internal conventions. Even
libaries used their own classifications. There was no attempt to unifify the
needs for publishers (working at document level) and data processors (including
OSes). This effort started only very late, when W3C finally started to work
seriously on fixing HTML, and make it more or less interoperable with SGML
(promoted by publishers). But at national level, there were still lot of other
competing standards (let's remember teletext, including the Minitel terminal
and Antiope for TV). People at home did not have access to any system capable
of rendering proportionaly fonts. All early computers for personal use were
based on fixed-width 8-bit fonts (including in Japan). China and Korea were
still not technology advanced as they are today (there were some efforts but
they were costly and there was little return at that time).

The adoption of the UCS was extremely long, and it is still not competely
finished even if now its support is mandatory in all new computiong standards
and their revisions. The last segment where it still resists is the mobile
phone industry (how can the SMS be so restricted and so much non-interoperable,
and inefficient?)

So French has a long tradition for its "fine", its support was demanded since long but constantly ignored by vendors making "the" standard. Publishers themselves resisted against the adoption of the web as a publishing platform: they prefered their legacy solutions as well, and did not care much about interoperability, so they did not pressure enough the standard makers to adopt the "fine". The same happened in US. There was no "commercial" incentive to adopt it and littel money coming from that sector (that has since suffered a lot from the loss of advertizing revenue, the competition of online publishers, the explosion of paper cost, but as well from the huge piracy level made on the Internet that reduced their sales and then their effective measured audience; the same is happening now on the TV and radio market; and on the Internet the adverizing market has been concentrated a lot and its revenues are less and less balanced; photographs and reporters have difficultiesnow to live from their work).


And there's little incentive now for creating quality products: so many products are developed and 
distributed very fast, and not enough people care about quality, or won't pay for it. The old good practives 
of typographs and publishers are most often ignored, they look "exotic" or 
"old-fashioned", and so many people say now these are "not needed" (just like they'll say 
that supporting multiple languages is not necessary)

Re: NNBSP (was: A last missing link for interoperable representation)

Reply via email to