Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Doug Ewell

Philippe Verdy wrote:


Or may be, only for historic texts, we could add a combining lowercase
e as an alternative to the existing diaeresis.


Something like U+0364 COMBINING LATIN SMALL LETTER E, maybe?

--
Doug Ewell | Thornton, CO, US | ewellic.org



Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Martin J. Dürst

On 2017/03/25 03:33, Doug Ewell wrote:

Philippe Verdy wrote:


But Unicode just prefered to keep the roundtrip compatiblity with
earlier 8-bit encodings (including existing ISO 8859 and DIN
standards) so that "ü" in German and French also have the same
canonical decomposition even if the diacritic is a diaeresis in French
and an umlaut in German, with different semantics and origins.


Was this only about compatibility, or perhaps also that the two signs
look identical and that disunifying them would have caused endless
confusion and misuse among users?


I'm not sure to what extent this was explicitly discussed when Unicode 
was created. The fact that the first 256 code points are identical to 
those in ISO-8859-1 was used as a big selling point when Unicode was 
first introduced. It may well have been that for Unicode, there was no 
discussion at all in this area, because ISO-8859-1 was already so well 
established.


And for ISO-8859-1, space was an important concern. Ideally, both 
Islandic and Turkish (and the letters missed for French) would have been 
covered, but that wasn't possible. Disunifying diaeresis and umlaut 
would have been an unaffordable luxury.


The above reasons mask any inherent reasons for why diaeresis and umlaut 
would have been unified or not if the decision had been argued purely 
"on the merit". But having used both German and French, and e.g. looking 
at the situation in Switzerland, where it was important to be able to 
write both French and German on the same typewriter, I would definitely 
argue that disunifying them would have caused endless

confusion and errors among users.

Also, it was argued a few mails ago that diaeresis and umlaut don't look 
exactly the same. I remember well that when Apple introduced its first 
laser printers, there were widespread complaints that the fonts (was it 
Helvetica, Times Roman, and Palatino?) unified away the traditional 
differences in the cuts of these typefaces for different languages.


So to quite some extent, in the relevant period (i.e. 1970ies/80ies), 
the differences between diaeresis and umlaut may be due to design 
differences in the cuts for different languages (e.g. French and 
German). Nobody would have disunified some basic letters because they 
may have looked slightly different in cuts for different languages, and 
so people may also have been just fine with unifying diaeresis and 
umlaut. (German fonts e.g. may have contained a 'ë' for use e.g. with 
"Citroën", but the dots on that 'ë' will have been the same shape as 
'ä', 'ö', and 'ü' umlauts for design consistency, and the other way 
round for French).


Regards,   Martin.


Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Philippe Verdy
Given the history of characters and the initial desire to be forward
compatible with previous ISO standards, I am convinced that there was no
other choice than preserving the unification, otherwise it would have been
impossible to reliably remap the zillions documents and databases or
applications that were using ISO8859, and other related Windows, MacOS and
IBM codepages for OEMs or for EBCDIC. And with the developement of Internet
and the disire in both Unicode and ISO 10646 to leave the first page of
code points in the UCS and ISO8859-1 fully compatible code for code (and
the fact that there was no variant of ISO8859-1 standardized for Germany,
Switzerland, Austria, Belgium and Luxembourg, that did not request it
(causing nightmares notably in the last three countries, and a lot of
legacy softwares on Windows and MacOS needing such bijective mapping;
finally the Unicode Consortium initially was developed separately from the
IUSO standard and merged later, and at that time, Microsofot and IBM were
the most active members and did not want to introduce incompatibilities and
causing troubles for other vendors).
Later there was a clear statement to keep the basic character properties,
stable, and it became impossisble to change the canonical equivalences
(after the bad experience found when mlerging efforts between Unicode and
ISO notably for enconding Hangul, and a strong initial resistance by China
that wanted to develop its own GB standard).
Encoding stability is now a rule that will be extremely hard to break.

Note: umlauts and diaeresis have not always looked the same, confusion
started lately between both during the middle of the 20th century and the
starting development of computing. It would have been impossible to reach a
large adoption of the UCS without such compromizes (and it took additional
years after both projects joined their efforts, before ISO finally closed
its working group on legacy 8-bit character sets, and stopped accepting any
new variants; ISO 8859-15 was one of the last failed attempt to standardize
a new 8-bit encoding, that finally almost nobody really used as they no
longer needed it; China resigned as well and finalized the roundtrip
mapping of its GB 18030 competing encoding with the UCS, so mappings for GB
18030 no longer needs new updates: any new encoding in the UCS is
immediately encoded as well in GB without modifying any line of code or
data, and any software or document compatiblle with the UCS should be
imediately compatible with the GB 18030 standard required in PR China; I
don't know if Hong Kong authorities made the same statement for its HKCS
standard before it reunified with China, or if Taiwan made a similar
decision; however Japan is adding new encodings in its JIS standard, pushed
by national vendors, and the UCS still has delays for accepting these
additions and not all is accepted, but in this area, there's a local
subcommity constantly negociating with Asian vendors and reporting its
efforts to Unicode and ISO).

About umlauts and diaeresis I'm not sure they were always looking the same.
If we try to encode old German, Hungarian or Czech texts, we may find some
discrepencies or ambiguities (but there's still no mechanism to distinguish
when an umlaut is really desired and a diaeresis is destired instead if
they don't look the same in historic script variants). We cannot encode
these using "variants" but possibly we may be using some combining controls
such as CGJ (encoded after the precombined letter or after the base
letter+diaresis, because of canonical equivalences it cannot be in the
middle). Or may be, only for historic texts, we could add a combining
lowercase e as an alternative to the existing diaeresis.


2017-03-24 19:33 GMT+01:00 Doug Ewell :

> Philippe Verdy wrote:
>
> > But Unicode just prefered to keep the roundtrip compatiblity with
> > earlier 8-bit encodings (including existing ISO 8859 and DIN
> > standards) so that "ü" in German and French also have the same
> > canonical decomposition even if the diacritic is a diaeresis in French
> > and an umlaut in German, with different semantics and origins.
>
> Was this only about compatibility, or perhaps also that the two signs
> look identical and that disunifying them would have caused endless
> confusion and misuse among users?
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>


Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Hans Åberg

> On 24 Mar 2017, at 19:33, Doug Ewell  wrote:
> 
> Philippe Verdy wrote:
> 
>> But Unicode just prefered to keep the roundtrip compatiblity with
>> earlier 8-bit encodings (including existing ISO 8859 and DIN
>> standards) so that "ü" in German and French also have the same
>> canonical decomposition even if the diacritic is a diaeresis in French
>> and an umlaut in German, with different semantics and origins.
> 
> Was this only about compatibility, or perhaps also that the two signs
> look identical and that disunifying them would have caused endless
> confusion and misuse among users?

The Swedish letters ÅÄÖ are simplified ligatures, and not diacritic marks. For 
ÄÖ, in handwritten script style, a tilde, the same as Spanish Ñ, which is also 
a simplified ligature.





Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Doug Ewell
Philippe Verdy wrote:

> But Unicode just prefered to keep the roundtrip compatiblity with
> earlier 8-bit encodings (including existing ISO 8859 and DIN
> standards) so that "ü" in German and French also have the same
> canonical decomposition even if the diacritic is a diaeresis in French
> and an umlaut in German, with different semantics and origins.

Was this only about compatibility, or perhaps also that the two signs
look identical and that disunifying them would have caused endless
confusion and misuse among users?

--
Doug Ewell | Thornton, CO, US | ewellic.org