Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)
Philippe Verdy wrote: Or may be, only for historic texts, we could add a combining lowercase e as an alternative to the existing diaeresis. Something like U+0364 COMBINING LATIN SMALL LETTER E, maybe? -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)
On 2017/03/25 03:33, Doug Ewell wrote: Philippe Verdy wrote: But Unicode just prefered to keep the roundtrip compatiblity with earlier 8-bit encodings (including existing ISO 8859 and DIN standards) so that "ü" in German and French also have the same canonical decomposition even if the diacritic is a diaeresis in French and an umlaut in German, with different semantics and origins. Was this only about compatibility, or perhaps also that the two signs look identical and that disunifying them would have caused endless confusion and misuse among users? I'm not sure to what extent this was explicitly discussed when Unicode was created. The fact that the first 256 code points are identical to those in ISO-8859-1 was used as a big selling point when Unicode was first introduced. It may well have been that for Unicode, there was no discussion at all in this area, because ISO-8859-1 was already so well established. And for ISO-8859-1, space was an important concern. Ideally, both Islandic and Turkish (and the letters missed for French) would have been covered, but that wasn't possible. Disunifying diaeresis and umlaut would have been an unaffordable luxury. The above reasons mask any inherent reasons for why diaeresis and umlaut would have been unified or not if the decision had been argued purely "on the merit". But having used both German and French, and e.g. looking at the situation in Switzerland, where it was important to be able to write both French and German on the same typewriter, I would definitely argue that disunifying them would have caused endless confusion and errors among users. Also, it was argued a few mails ago that diaeresis and umlaut don't look exactly the same. I remember well that when Apple introduced its first laser printers, there were widespread complaints that the fonts (was it Helvetica, Times Roman, and Palatino?) unified away the traditional differences in the cuts of these typefaces for different languages. So to quite some extent, in the relevant period (i.e. 1970ies/80ies), the differences between diaeresis and umlaut may be due to design differences in the cuts for different languages (e.g. French and German). Nobody would have disunified some basic letters because they may have looked slightly different in cuts for different languages, and so people may also have been just fine with unifying diaeresis and umlaut. (German fonts e.g. may have contained a 'ë' for use e.g. with "Citroën", but the dots on that 'ë' will have been the same shape as 'ä', 'ö', and 'ü' umlauts for design consistency, and the other way round for French). Regards, Martin.
Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)
Given the history of characters and the initial desire to be forward compatible with previous ISO standards, I am convinced that there was no other choice than preserving the unification, otherwise it would have been impossible to reliably remap the zillions documents and databases or applications that were using ISO8859, and other related Windows, MacOS and IBM codepages for OEMs or for EBCDIC. And with the developement of Internet and the disire in both Unicode and ISO 10646 to leave the first page of code points in the UCS and ISO8859-1 fully compatible code for code (and the fact that there was no variant of ISO8859-1 standardized for Germany, Switzerland, Austria, Belgium and Luxembourg, that did not request it (causing nightmares notably in the last three countries, and a lot of legacy softwares on Windows and MacOS needing such bijective mapping; finally the Unicode Consortium initially was developed separately from the IUSO standard and merged later, and at that time, Microsofot and IBM were the most active members and did not want to introduce incompatibilities and causing troubles for other vendors). Later there was a clear statement to keep the basic character properties, stable, and it became impossisble to change the canonical equivalences (after the bad experience found when mlerging efforts between Unicode and ISO notably for enconding Hangul, and a strong initial resistance by China that wanted to develop its own GB standard). Encoding stability is now a rule that will be extremely hard to break. Note: umlauts and diaeresis have not always looked the same, confusion started lately between both during the middle of the 20th century and the starting development of computing. It would have been impossible to reach a large adoption of the UCS without such compromizes (and it took additional years after both projects joined their efforts, before ISO finally closed its working group on legacy 8-bit character sets, and stopped accepting any new variants; ISO 8859-15 was one of the last failed attempt to standardize a new 8-bit encoding, that finally almost nobody really used as they no longer needed it; China resigned as well and finalized the roundtrip mapping of its GB 18030 competing encoding with the UCS, so mappings for GB 18030 no longer needs new updates: any new encoding in the UCS is immediately encoded as well in GB without modifying any line of code or data, and any software or document compatiblle with the UCS should be imediately compatible with the GB 18030 standard required in PR China; I don't know if Hong Kong authorities made the same statement for its HKCS standard before it reunified with China, or if Taiwan made a similar decision; however Japan is adding new encodings in its JIS standard, pushed by national vendors, and the UCS still has delays for accepting these additions and not all is accepted, but in this area, there's a local subcommity constantly negociating with Asian vendors and reporting its efforts to Unicode and ISO). About umlauts and diaeresis I'm not sure they were always looking the same. If we try to encode old German, Hungarian or Czech texts, we may find some discrepencies or ambiguities (but there's still no mechanism to distinguish when an umlaut is really desired and a diaeresis is destired instead if they don't look the same in historic script variants). We cannot encode these using "variants" but possibly we may be using some combining controls such as CGJ (encoded after the precombined letter or after the base letter+diaresis, because of canonical equivalences it cannot be in the middle). Or may be, only for historic texts, we could add a combining lowercase e as an alternative to the existing diaeresis. 2017-03-24 19:33 GMT+01:00 Doug Ewell: > Philippe Verdy wrote: > > > But Unicode just prefered to keep the roundtrip compatiblity with > > earlier 8-bit encodings (including existing ISO 8859 and DIN > > standards) so that "ü" in German and French also have the same > > canonical decomposition even if the diacritic is a diaeresis in French > > and an umlaut in German, with different semantics and origins. > > Was this only about compatibility, or perhaps also that the two signs > look identical and that disunifying them would have caused endless > confusion and misuse among users? > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >
Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)
> On 24 Mar 2017, at 19:33, Doug Ewellwrote: > > Philippe Verdy wrote: > >> But Unicode just prefered to keep the roundtrip compatiblity with >> earlier 8-bit encodings (including existing ISO 8859 and DIN >> standards) so that "ü" in German and French also have the same >> canonical decomposition even if the diacritic is a diaeresis in French >> and an umlaut in German, with different semantics and origins. > > Was this only about compatibility, or perhaps also that the two signs > look identical and that disunifying them would have caused endless > confusion and misuse among users? The Swedish letters ÅÄÖ are simplified ligatures, and not diacritic marks. For ÄÖ, in handwritten script style, a tilde, the same as Spanish Ñ, which is also a simplified ligature.
Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)
Philippe Verdy wrote: > But Unicode just prefered to keep the roundtrip compatiblity with > earlier 8-bit encodings (including existing ISO 8859 and DIN > standards) so that "ü" in German and French also have the same > canonical decomposition even if the diacritic is a diaeresis in French > and an umlaut in German, with different semantics and origins. Was this only about compatibility, or perhaps also that the two signs look identical and that disunifying them would have caused endless confusion and misuse among users? -- Doug Ewell | Thornton, CO, US | ewellic.org