Re: The Unicode Standard and ISO [localizable sentences]
> The topic of localizable sentences is now closed on this mail list. > Please take that topic elsewhere. > Thank you. May I please mention, with permission, that there is now a thread to discuss the issue of translations and their context that was mentioned? https://community.serif.com/discussion/112261/a-discussion-about-translations-and-their-context-localizable-sentences-research-project-related The thread is in the lounge section of the support forum of Serif, the English software company that produced the program that I use to produce PDF (Portable Document Format) documents. William Overington Friday 15 June 2018
Re: The Unicode Standard and ISO
On Tue, 12 Jun 2018 19:49:10 +0200, Mark Davis ☕️ via Unicode wrote: […] > People interested in this topic should > (a) start up their own project somewhere else, > (b) take discussion of it off this list, > (c) never bring it up again on this list. Thank you for letting us know. I apologize for my e-mailing. I didn’t respond in the wake for a variety of reasons while immediately fully agreeing, of course as I had mainly wondered why I got no feedback when I’d lastly terminated a thread turning likewise, but no matter anymore. No problem, as far as it belongs to me, this topic will never be read again here nor elsewhere. Sorry again. Best regards, Marcel
Re: The Unicode Standard and ISO [localizable sentences]
The topic of localizable sentences is now closed on this mail list. Please take that topic elsewhere. Thank you. On 6/12/2018 10:49 AM, Mark Davis âï¸ via Unicode wrote: > That is often a viable approach. But proponents shouldn't get the wrong impression. I think the chance of anything resembling the "localized sentences" / "international message components" have zero chance of being adopted by Unicode (including the encoding, CLDR, anything). It is a waste of many people's time discussing it further on this list. > Why? As discussed many times on this list, it would take a major effort, is not scoped properly (the translation of messages depends highly on context, including specific products), and would not meet the needs of practically anyone. > People interested in this topic should > (a) start up their own project somewhere else, > (b) take discussion of it off this list, > (c) never bring it up again on this list.
Re: The Unicode Standard and ISO
On Mon, Jun 11, 2018 at 8:32 AM, William_J_G Overington < wjgo_10...@btinternet.com> wrote: > Steven R. Loomis wrote: > > >Marcel, > > The idea is not necessarily without merit. However, CLDR does not > usually expand scope just because of a suggestion. > I usually recommend creating a new project first - gathering data, > looking at and talking to projects to ascertain the usefulness of common > messages.. one of the barriers to adding new content for CLDR is not just > the design, but collecting initial data. When emoji or sub-territory names > were added, many languages were included before it was added to CLDR. > > Well, maybe usually, but perhaps not this time? Especially this time. To Mark's later point: Start a separate project. Don't assume it will ever merge with CLDR. If it succeeds, great.
Re: The Unicode Standard and ISO
Steven wrote: > I usually recommend creating a new project first... That is often a viable approach. But proponents shouldn't get the wrong impression. I think the chance of anything resembling the "localized sentences" / "international message components" have zero chance of being adopted by Unicode (including the encoding, CLDR, anything). It is a waste of many people's time discussing it further on this list. Why? As discussed many times on this list, it would take a major effort, is not scoped properly (the translation of messages depends highly on context, including specific products), and would not meet the needs of practically anyone. People interested in this topic should (a) start up their own project somewhere else, (b) take discussion of it off this list, (c) never bring it up again on this list. Mark On Tue, Jun 12, 2018 at 4:53 PM, Marcel Schneider via Unicode < unicode@unicode.org> wrote: > > William, > > On 12/06/18 12:26, William_J_G Overington wrote: > > > > Hi Marcel > > > > > I don’t fully disagree with Asmus, as I suggested to make available > localizable (and effectively localized) libraries of message components, > rather than of entire messages. > > > > Could you possibly give some examples of the message components to which > you refer please? > > > > Likewise I’d be interested in asking Jonathan Rosenne for an example or > two of automated translation from English to bidi languages with data > embedded, > as on Mon, 11 Jun 2018 15:42:38 +, Jonathan Rosenne via Unicode wrote: > […] > > > > One has to see it to believe what happens to messages translated > mechanically from English to bidi languages when data is embedded in the > text. > > But both would require launching a new thread. > > Thinking hard enough, I’m even afraid that most subscribers wouldn’t be > interested, so we’d have to move off-list. > > One alternative I can think of is to use one of the CLDR mailing lists. I > subscribed to CLDR-users when I was directed to move there some technical > discussion > about keyboard layouts from Unicode Public. > > But now as international message components are not yet a part of CLDR, > we’d need to ask for extra permission to do so. > > An additional drawback of launching a technical discussion right now is > that significant parts of CLDR data are not yet correctly localized so > there is another > bunch of priorities under July 11 deadline. I guess that vendors wouldn’t > be glad to see us gathering data for new structures while level=Modern > isn’t complete. > > In the meantime, you are welcome to contribute and to motivate missing > people to do the same. > > Best regards, > > Marcel > >
Re: The Unicode Standard and ISO
> ISO 15924 is and ISO standard. Aspects of its content may be mirrored in other places, but “moving its content” to CLDR makes no sense. Fully agreed. For what it's worth, I reopened a bug of Roozbeh's https://unicode.org/cldr/trac/ticket/827?#comment:9 to make sure the ISO 15924 French content gets properly mirrored into CLDR, it looks like there is a French-specific bug there, which may be what you are seeing, Marcel. On Tue, Jun 12, 2018 at 8:57 AM, Michael Everson via Unicode < unicode@unicode.org> wrote: > All right, if you want a clear explanation. > > Yes, I think the ISO 8859-4 character names for the Latvian letters were > mistaken. Yes, I think that mapping them to decompositions with CEDILLA > rather than COMMA BELOW was a mistake. Evidently some felt that the > normative mapping was important. This does not mean that SC2 “failed to do > its part” and it did not cause a lack of desire for cooperation, and it > bloody well did not “damage the reputation of the whole ISO/IEC”. > > As to ISO 15924, it was developed bilingually, and there was consensus on > the names that are there. Last year you suggested a massive number of name > changes to the French translation of ISO/IEC 10646, and I criticized you > for foregoing stability for your own preferences. When it came to the names > in 15924, I told you that I do not trust your judgement, and that I would > consider revisions to the French names when you came back with consensus on > those changes with experts Alain LaBonté, Patrick Andries, Denis Jacquerye, > and Marc Lodewijck. As I have not heard from them, I conclude that no such > consensus exists. > > ISO 15924 is and ISO standard. Aspects of its content may be mirrored in > other places, but “moving its content” to CLDR makes no sense. > > Michael Everson > > > On 12 Jun 2018, at 16:20, Marcel Schneider via Unicode < > unicode@unicode.org> wrote: > > On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote: > >> > >> Marcel, > >> You have put words into my mouth. Please don’t. Your description of > what I said is NOT accurate. > >> > >>> On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote: > >>> And in this thread I wanted to demonstrate that by focusing on the > wrong priorities, i.e. legacy character names instead of the practicability > of on-going encoding and the accurateness of specified decompositions—so > that in some instances cedilla was used instead of comma below, Michael > pointed out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its > mission—and thus didn’t inspire a desire of extensive cooperation (and > damaged the reputation of the whole ISO/IEC). > > > > Michael, I’d better quote your actual e-mail: > > > > On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote: > > […] > >> Many things have more than one name. The only truly bad misnomers from > that period was related to a mapping error, > >> namely, in the treatment of Latvian characters which are called CEDILLA > rather than COMMA BELOW. > > > > Now I fail to understand why this mustn’t be reworded to “the > accurateness of specified decompositions—so that in some instances cedilla > was used instead of comma below[.]” If any correction can be made, I’d be > eager to take note. Thanks for correcting. > > > > Now let’s append the e-mail that I was about to send: > > > > Another ISO Standard that needs to be mentioned in this thread is ISO > 15924 (script codes; not ISO/IEC). It has a particular status in that > Unicode is the Registration Authority. > > > > I wonder whether people agree that it has a French version. Actually it > does have a French version, but Michael Everson (Registrar) revealed on > this List multiple issues with synching French script names in ISO 15924-fr > and in Code Charts translations. > > > > Shouldn’t this content be moved to CLDR? At least with respect to > localized script names. > > >
Re: The Unicode Standard and ISO
On 6/12/2018 7:58 AM, Michael Everson via Unicode wrote: Marcel, You have put words into my mouth. Please don’t. Your description of what I said is NOT accurate. On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote: And in this thread I wanted to demonstrate that by focusing on the wrong priorities, i.e. legacy character names instead of the practicability of on-going encoding and the accurateness of specified decompositions—so that in some instances cedilla was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and thus didn’t inspire a desire of extensive cooperation (and damaged the reputation of the whole ISO/IEC). The final conclusion isn't backed by the evidence. This kind of fault-finding needs to stop - it's unproductive. A./
Re: The Unicode Standard and ISO
CLDR already has localized script names. The English is taken from ISO 15924. https://cldr-ref.unicode.org/cldr-apps/v#/fr/Scripts/ On Tue, Jun 12, 2018 at 8:20 AM, Marcel Schneider via Unicode < unicode@unicode.org> wrote: > On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote: > > > > Marcel, > > > > You have put words into my mouth. Please don’t. Your description of what > I said is NOT accurate. > > > > > On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote: > > > > > > And in this thread I wanted to demonstrate that by focusing on the > wrong priorities, i.e. legacy character names instead of > > > the practicability of on-going encoding and the accurateness of > specified decompositions—so that in some instances cedilla > > > was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 > SC2/WG2 failed to do its part and missed its mission— > > > and thus didn’t inspire a desire of extensive cooperation (and damaged > the reputation of the whole ISO/IEC). > > Michael, I’d better quote your actual e-mail: > > On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote: > […] > > Many things have more than one name. The only truly bad misnomers from > that period was related to a mapping error, > > namely, in the treatment of Latvian characters which are called CEDILLA > rather than COMMA BELOW. > > Now I fail to understand why this mustn’t be reworded to “the accurateness > of specified decompositions—so that in some instances > cedilla was used instead of comma below[.]” > If any correction can be made, I’d be eager to take note. > Thanks for correcting. > > Now let’s append the e-mail that I was about to send: > > Another ISO Standard that needs to be mentioned in this thread is ISO > 15924 (script codes; not ISO/IEC). > It has a particular status in that Unicode is the Registration Authority. > > I wonder whether people agree that it has a French version. Actually it > does have a French version, but > Michael Everson (Registrar) revealed on this List multiple issues with > synching French script names in > ISO 15924-fr and in Code Charts translations. > > Shouldn’t this content be moved to CLDR? At least with respect to > localized script names. >
Re: The Unicode Standard and ISO
All right, if you want a clear explanation. Yes, I think the ISO 8859-4 character names for the Latvian letters were mistaken. Yes, I think that mapping them to decompositions with CEDILLA rather than COMMA BELOW was a mistake. Evidently some felt that the normative mapping was important. This does not mean that SC2 “failed to do its part” and it did not cause a lack of desire for cooperation, and it bloody well did not “damage the reputation of the whole ISO/IEC”. As to ISO 15924, it was developed bilingually, and there was consensus on the names that are there. Last year you suggested a massive number of name changes to the French translation of ISO/IEC 10646, and I criticized you for foregoing stability for your own preferences. When it came to the names in 15924, I told you that I do not trust your judgement, and that I would consider revisions to the French names when you came back with consensus on those changes with experts Alain LaBonté, Patrick Andries, Denis Jacquerye, and Marc Lodewijck. As I have not heard from them, I conclude that no such consensus exists. ISO 15924 is and ISO standard. Aspects of its content may be mirrored in other places, but “moving its content” to CLDR makes no sense. Michael Everson > On 12 Jun 2018, at 16:20, Marcel Schneider via Unicode > wrote: > On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote: >> >> Marcel, >> You have put words into my mouth. Please don’t. Your description of what I >> said is NOT accurate. >> >>> On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote: >>> And in this thread I wanted to demonstrate that by focusing on the wrong >>> priorities, i.e. legacy character names instead of the practicability of >>> on-going encoding and the accurateness of specified decompositions—so that >>> in some instances cedilla was used instead of comma below, Michael pointed >>> out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and >>> thus didn’t inspire a desire of extensive cooperation (and damaged the >>> reputation of the whole ISO/IEC). > > Michael, I’d better quote your actual e-mail: > > On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote: > […] >> Many things have more than one name. The only truly bad misnomers from that >> period was related to a mapping error, >> namely, in the treatment of Latvian characters which are called CEDILLA >> rather than COMMA BELOW. > > Now I fail to understand why this mustn’t be reworded to “the accurateness of > specified decompositions—so that in some instances cedilla was used instead > of comma below[.]” If any correction can be made, I’d be eager to take note. > Thanks for correcting. > > Now let’s append the e-mail that I was about to send: > > Another ISO Standard that needs to be mentioned in this thread is ISO 15924 > (script codes; not ISO/IEC). It has a particular status in that Unicode is > the Registration Authority. > > I wonder whether people agree that it has a French version. Actually it does > have a French version, but Michael Everson (Registrar) revealed on this List > multiple issues with synching French script names in ISO 15924-fr and in Code > Charts translations. > > Shouldn’t this content be moved to CLDR? At least with respect to localized > script names.
Re: The Unicode Standard and ISO
On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote: > > Marcel, > > You have put words into my mouth. Please don’t. Your description of what I > said is NOT accurate. > > > On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode wrote: > > > > And in this thread I wanted to demonstrate that by focusing on the wrong > > priorities, i.e. legacy character names instead of > > the practicability of on-going encoding and the accurateness of specified > > decompositions—so that in some instances cedilla > > was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 SC2/WG2 > > failed to do its part and missed its mission— > > and thus didn’t inspire a desire of extensive cooperation (and damaged the > > reputation of the whole ISO/IEC). Michael, I’d better quote your actual e-mail: On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote: […] > Many things have more than one name. The only truly bad misnomers from that > period was related to a mapping error, > namely, in the treatment of Latvian characters which are called CEDILLA > rather than COMMA BELOW. Now I fail to understand why this mustn’t be reworded to “the accurateness of specified decompositions—so that in some instances cedilla was used instead of comma below[.]” If any correction can be made, I’d be eager to take note. Thanks for correcting. Now let’s append the e-mail that I was about to send: Another ISO Standard that needs to be mentioned in this thread is ISO 15924 (script codes; not ISO/IEC). It has a particular status in that Unicode is the Registration Authority. I wonder whether people agree that it has a French version. Actually it does have a French version, but Michael Everson (Registrar) revealed on this List multiple issues with synching French script names in ISO 15924-fr and in Code Charts translations. Shouldn’t this content be moved to CLDR? At least with respect to localized script names.
Re: The Unicode Standard and ISO
Marcel, You have put words into my mouth. Please don’t. Your description of what I said is NOT accurate. > On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode > wrote: > > And in this thread I wanted to demonstrate that by focusing on the wrong > priorities, i.e. legacy character names instead of the practicability of > on-going encoding and the accurateness of specified decompositions—so that in > some instances cedilla was used instead of comma below, Michael pointed out—, > ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and thus > didn’t inspire a desire of extensive cooperation (and damaged the reputation > of the whole ISO/IEC).
Re: The Unicode Standard and ISO
William, On 12/06/18 12:26, William_J_G Overington wrote: > > Hi Marcel > > > I don’t fully disagree with Asmus, as I suggested to make available > > localizable (and effectively localized) libraries of message components, > > rather than of entire messages. > > Could you possibly give some examples of the message components to which you > refer please? > Likewise I’d be interested in asking Jonathan Rosenne for an example or two of automated translation from English to bidi languages with data embedded, as on Mon, 11 Jun 2018 15:42:38 +, Jonathan Rosenne via Unicode wrote: […] > > > One has to see it to believe what happens to messages translated > > > mechanically from English to bidi languages when data is embedded in the > > > text. But both would require launching a new thread. Thinking hard enough, I’m even afraid that most subscribers wouldn’t be interested, so we’d have to move off-list. One alternative I can think of is to use one of the CLDR mailing lists. I subscribed to CLDR-users when I was directed to move there some technical discussion about keyboard layouts from Unicode Public. But now as international message components are not yet a part of CLDR, we’d need to ask for extra permission to do so. An additional drawback of launching a technical discussion right now is that significant parts of CLDR data are not yet correctly localized so there is another bunch of priorities under July 11 deadline. I guess that vendors wouldn’t be glad to see us gathering data for new structures while level=Modern isn’t complete. In the meantime, you are welcome to contribute and to motivate missing people to do the same. Best regards, Marcel
Re: The Unicode Standard and ISO
Hi Marcel > I don’t fully disagree with Asmus, as I suggested to make available > localizable (and effectively localized) libraries of message components, > rather than of entire messages. Could you possibly give some examples of the message components to which you refer please? Asmus wrote: > A middle ground is a shared terminology database that allows translators > working on different products to arrive at the same translation for the same > things. Translators already know how to use such databases in their work > flow, and integrating a shared one with a product-specific one is much easier > than trying to deal with a set of random error messages. I am not a linguist. I am interested in languages but my knowledge of languages is little more than that of general education, though I have written a song in French. http://www.users.globalnet.co.uk/~ngo/une_chanson.pdf So when Asmus wrote "Translators already know how to use such databases in their work flow, ", I do not know how to do that myself. > The challenge as I see it is to get them translated to all locales. Well, yes, that is a big challenge. It depends whether people want to get it done. In England, with its changeable weather, part of the culture is to talk about the weather. For example, at a bus stop talking about the weather with other people: it is sociable without being intrusive or controversial. Alas it did not occur to me that that might seem strange to some people who are not from England. http://www.english-at-home.com/speaking/talking-about-the-weather/ http://www.bbc.com/future/story/20151214-why-do-brits-talk-about-the-weather-so-much I remember when I wrote about localizable sentences in this mailing list in mid-April 2009, using sentences about the weather, I hoped, in hindsight rather naively, that people on the mailing list would be interested and that translations into many languages would be posted and then things would get going. In the event, only one person, Magnus Bodin, provided translations. Magnus provided translations into Swedish and also provided a translation for an additional sentence as well. I knew no Swedish myself. These translations have been extremely helpful in my research project as they demonstrate communication through the language barrier using encoded localizable sentences. Yesterday I provided three example error message sentences. https://www.unicode.org/mail-arch/unicode-ml/y2018-m06/0088.html Please consider one of them, which could be output as a code number, say, ::4842357:; from an application program if someone enters a letter of the alphabet into a curency field, and then displayed localized into a language by first decoding using a sentence.dat UTF-16 text file for that language that includes a line that starts ::4842357:;| and then has the localization into that particular language, the language being any language that can be displayed using Unicode. For English, the line in the sentence.dat file would be as follows. ::4842357:;|Data entry for the currency field must be either a whole positive number or a positive number to exactly two decimal places. It would be great if some bilingual readers of this mailing list were to post a translation of the above line of text into another language. In my research I am using an integral sign as a base character and circled digit characters. If possible, a character such as U+FFF7 could be encoded to be the base character as that would provide a unique unambiguous link to star space from Unicode plain text. However whether that happens at some future time will depend upon there being sufficient interest at that future time in using localizable sentences for communication through the language barrier. William Overington Tuesday 12 June 2018
Re: The Unicode Standard and ISO
On Mon, 11 Jun 2018 16:32:45 +0100 (BST), William_J_G Overington via Unicode wrote: […] > Asmus Freytag wrote: > > > If you tried to standardize all error messages even in one language you > > would never arrive at something that would be universally useful. > > Well that is a big "If". One cannot standardize all pictures as emoji, but > emoji still get encoded, some every year now. > > I first learned to program back in the 1960s using the Algol 60 language on > an Elliott 803 mainframe computer, five track paper tape, > teleprinters to prepare a program on white tape, results out on coloured > tape, colours changed when the rolls changed. If I remember > correctly, error messages, either at compile time or at run time came out as > messages of a line number and an error number for compile > time errors and a number for a run time error. One then looked up the number > in the manual or on the enlarged version of the numbers > and the corresponding error messages that was mounted on the wall. > > > While some simple applications may find that all their needs for > > communicating with their users are covered, most would wish they had > > some other messages available. > > Yes, but more messages could be added to the list much more often than emoji > are added to The Unicode Standard, maybe every month > or every fortnight or every week if needed. > > > To adopt your scheme, they would need to have a bifurcated approach, where > > some messages follow the standard, while others do not (cannot). > > Not necessarily. A developer would just need to send in a request to Unicode > Inc. to add the needed extra sentences to the list and get a code number. > > > It's pushing this kind of impractical scheme that gives standardizers a bad > > name. > > It is not an impractical scheme. I don’t fully disagree with Asmus, as I suggested to make available localizable (and effectively localized) libraries of message components, rather than of entire messages. The challenge as I see it is to get them translated to all locales. For this I'm hoping that the advantage of improving user support upstream instead of spending more time on support fora would be obvious. By contrast I do disagree with the idea that industrial standards (as opposed to governmental procurement) are a safeguard against impractical schemes. Devising impractical specifications on industrial procurement hasn't even been a privilege of the French NB (referring to the examples in my e-mail: https://unicode.org/mail-arch/unicode-ml/y2018-m06/0082.html ), as demonstrated with the example of the hyphen conundrum where Unicode pushes the use of keyboard layouts featuring two distinct hyphens with same general category and same behavior, but different glyphs in some fonts whose designers didn’t think further than the original point of overly disambiguating hyphen semantics—while getting around similar traps with other punctuations. And in this thread I wanted to demonstrate that by focusing on the wrong priorities, i.e. legacy character names instead of the practicability of on-going encoding and the accurateness of specified decompositions—so that in some instances cedilla was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and thus didn’t inspire a desire of extensive cooperation (and damaged the reputation of the whole ISO/IEC). Best regards, Marcel
Re: The Unicode Standard and ISO
Steven R. Loomis wrote: >Marcel, > The idea is not necessarily without merit. However, CLDR does not usually > expand scope just because of a suggestion. I usually recommend creating a new project first - gathering data, looking at and talking to projects to ascertain the usefulness of common messages.. one of the barriers to adding new content for CLDR is not just the design, but collecting initial data. When emoji or sub-territory names were added, many languages were included before it was added to CLDR. Well, maybe usually, but perhaps not this time? I opine that if it is going to be done it needs to be done under the umbrella of Unicode Inc. and have lots of people contribute a bit: that way businesses may well use it because being part of Unicode Inc. they will have provenance over there being no possibility of later claims for payment. Not that any such claim would necessarily be made, but they need to know that. Also having lots of people can help get the translations done as there are a number of people who are bilingual who might like to pitch in. So, give the idea a sound chance of being implemented please. Asmus Freytag wrote: > If you tried to standardize all error messages even in one language you would > never arrive at something that would be universally useful. Well that is a big "If". One cannot standardize all pictures as emoji, but emoji still get encoded, some every year now. I first learned to program back in the 1960s using the Algol 60 language on an Elliott 803 mainframe computer, five track paper tape, teleprinters to prepare a program on white tape, results out on coloured tape, colours changed when the rolls changed. If I remember correctly, error messages, either at compile time or at run time came out as messages of a line number and an error number for compile time errors and a number for a run time error. One then looked up the number in the manual or on the enlarged version of the numbers and the corresponding error messages that was mounted on the wall. > While some simple applications may find that all their needs for > communicating with their users are covered, most would wish they had some > other messages available. Yes, but more messages could be added to the list much more often than emoji are added to The Unicode Standard, maybe every month or every fortnight or every week if needed. > To adopt your scheme, they would need to have a bifurcated approach, where > some messages follow the standard, while others do not (cannot). Not necessarily. A developer would just need to send in a request to Unicode Inc. to add the needed extra sentences to the list and get a code number. > It's pushing this kind of impractical scheme that gives standardizers a bad > name. It is not an impractical scheme. It can be implemented straightforwardly using the star space system that I have devised. http://www.users.globalnet.co.uk/~ngo/An_encoding_space_designed_for_application_in_encoding_localizable_sentences.pdf http://www.users.globalnet.co.uk/~ngo/localizable_sentences_the_novel_chapter_019.pdf Start off with space for error messages and number them from 4840001 through to 484999 and allocate meanings as needed. Then a side view of a 4-8-4 locomotive facing to the left could be a logo for the project. Big 4-8-4 locomotives were built years ago. If people could do that then surely people can implement this project successfully now if they want to do so. For example, one error message could be as follows: Data entry for the currency field must be either a whole positive number or a positive number to exactly two decimal places. Another could be as follows: Division by zero was attempted. Yet another could be as follows: The number of opening parentheses in the expression does not match the number of closing parentheses. If some day more than error messages are needed, these can be provided within star space as it is vast. http://www.users.globalnet.co.uk/~ngo/a_completed_publication_about_localizable_sentences_research.pdf William Overington Monday 11 June 2018
RE: The Unicode Standard and ISO
The scheme I have been using for years is a short message in the local language giving the main point of the error, together with a detailed message in English. One has to see it to believe what happens to messages translated mechanically from English to bidi languages when data is embedded in the text. Best Regards, Jonathan Rosenne -Original Message- From: William_J_G Overington [mailto:wjgo_10...@btinternet.com] Sent: Monday, June 11, 2018 6:33 PM To: verd...@wanadoo.fr; Jonathan Rosenne; asm...@ix.netcom.com; Steven R. Loomis; jameskass...@gmail.com; charupd...@orange.fr; peter...@microsoft.com; richard.wording...@ntlworld.com Cc: unicode@unicode.org Subject: Re: The Unicode Standard and ISO Steven R. Loomis wrote: >Marcel, > The idea is not necessarily without merit. However, CLDR does not usually > expand scope just because of a suggestion. I usually recommend creating a new project first - gathering data, looking at and talking to projects to ascertain the usefulness of common messages.. one of the barriers to adding new content for CLDR is not just the design, but collecting initial data. When emoji or sub-territory names were added, many languages were included before it was added to CLDR. Well, maybe usually, but perhaps not this time? I opine that if it is going to be done it needs to be done under the umbrella of Unicode Inc. and have lots of people contribute a bit: that way businesses may well use it because being part of Unicode Inc. they will have provenance over there being no possibility of later claims for payment. Not that any such claim would necessarily be made, but they need to know that. Also having lots of people can help get the translations done as there are a number of people who are bilingual who might like to pitch in. So, give the idea a sound chance of being implemented please. Asmus Freytag wrote: > If you tried to standardize all error messages even in one language you would > never arrive at something that would be universally useful. Well that is a big "If". One cannot standardize all pictures as emoji, but emoji still get encoded, some every year now. I first learned to program back in the 1960s using the Algol 60 language on an Elliott 803 mainframe computer, five track paper tape, teleprinters to prepare a program on white tape, results out on coloured tape, colours changed when the rolls changed. If I remember correctly, error messages, either at compile time or at run time came out as messages of a line number and an error number for compile time errors and a number for a run time error. One then looked up the number in the manual or on the enlarged version of the numbers and the corresponding error messages that was mounted on the wall. > While some simple applications may find that all their needs for > communicating with their users are covered, most would wish they had some > other messages available. Yes, but more messages could be added to the list much more often than emoji are added to The Unicode Standard, maybe every month or every fortnight or every week if needed. > To adopt your scheme, they would need to have a bifurcated approach, where > some messages follow the standard, while others do not (cannot). Not necessarily. A developer would just need to send in a request to Unicode Inc. to add the needed extra sentences to the list and get a code number. > It's pushing this kind of impractical scheme that gives standardizers a bad > name. It is not an impractical scheme. It can be implemented straightforwardly using the star space system that I have devised. http://www.users.globalnet.co.uk/~ngo/An_encoding_space_designed_for_application_in_encoding_localizable_sentences.pdf http://www.users.globalnet.co.uk/~ngo/localizable_sentences_the_novel_chapter_019.pdf Start off with space for error messages and number them from 4840001 through to 484999 and allocate meanings as needed. Then a side view of a 4-8-4 locomotive facing to the left could be a logo for the project. Big 4-8-4 locomotives were built years ago. If people could do that then surely people can implement this project successfully now if they want to do so. For example, one error message could be as follows: Data entry for the currency field must be either a whole positive number or a positive number to exactly two decimal places. Another could be as follows: Division by zero was attempted. Yet another could be as follows: The number of opening parentheses in the expression does not match the number of closing parentheses. If some day more than error messages are needed, these can be provided within star space as it is vast. http://www.users.globalnet.co.uk/~ngo/a_completed_publication_about_localizable_sentences_research.pdf William Overington Monday 11 June 2018
RE: The Unicode Standard and ISO
> > From the outset, Unicode and the US national body tried repeatedly to > > engage with SC35 and SC35/WG5, […] > As a reminder: The actual SC35 is in total disconnect from the same SC35 as > it was from the mid-eighties to mid-nineties and beyond. Edit: ISO/IEC JTC1 SC35 was founded in 1999. (In the mentioned timespan, there was SC18/WG9.) > > informing them of UTS #35 (LDML) and CLDR, but were ignored. SC35 didn’t > > appear to be interested > [, or appeared to be interested in ] > > a pet project and not in what is actually being used in industry. It seems it isn’t even a pet project, today it’s just nothing but a deplorable mismanagement mess. In my opinion, at some point the inadvertant French NB will apologize to the US National Body and to the Unicode Consortium. As of now, I apologize for my part. Best regards, Marcel
RE: The Unicode Standard and ISO
On Sun, 10 Jun 2018 15:11:48 +, Peter Constable via Unicode wrote: > > > ... For another part it [sync with ISO/IEC 15897] failed because the > > Consortium refused to cooperate, despite of > > repeated proposals for a merger of both instances. > > First, ISO/IEC 15897 is built on a data-format specification, ISO/IEC TR > 14652, that never achieved the support > needed to become an international standard, and has since been withdrawn. > (TRs cannot remain TRs forever.) > Now, JTC1/SC35 began work four or five years ago to create data-format > specification for this, Approved Work Item 30112. > From the outset, Unicode and the US national body tried repeatedly to engage > with SC35 and SC35/WG5, The involvement in this decade of ISO/IEC JTC1 SC35 WG5 adds a scary level of complexity unrelated to the core issues. Andrew West already hinted that the stuff was moved from SC22 to SC35, but it took me some extra investigation to get the point. As a reminder: The actual SC35 is in total disconnect from the same SC35 as it was from the mid-eighties to mid-nineties and beyond. > informing them of UTS #35 (LDML) and CLDR, but were ignored. SC35 didn’t > appear to be interested [, or appeared to be interested in ] > a pet project and not in what is actually being used in industry. Sorry, I experienced some difficulty to understand and filled in what I think could have been elided. > After several failed attempts, Unicode and the USNB gave up trying. Thank you for bringing up this key information. > > So, any suggestion that Unicode has failed to cooperate or is is dropping the > ball with regard to locale data and ISO > is simply uninformed. That is exact. So I think this thread has now led to a main response, and all concerned people on this List are welcome to take note of these new facts showing that Unicode is totally innocent in ISO/IEC locale data issues. If that doesn’t suffice to convince missing people to cooperate in reviewing French data in CLDR, they may be pleased to know that I try to keep helping do our best. Thank you everyone. Best regards, Marcel > > > Peter > > > From: Unicode On Behalf Of Mark Davis ?? via Unicode > Sent: Thursday, June 7, 2018 6:20 AM > To: Marcel Schneider > Cc: UnicodeMailing > Subject: Re: The Unicode Standard and ISO > > A few facts. > > > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. > > ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could > speak to the synchronization level in more detail, but the above statement is inaccurate. > > > ... For another part it [sync with ISO/IEC 15897] failed because the > > Consortium refused to cooperate, despite of > repeated proposals for a merger of both instances. > > I recall no serious proposals for that. > > (And in any event — very unlike the synchrony with 10646 and 14651 — ISO > 15897 brought no value to the table. Certainly nothing to outweigh the considerable costs of maintaining synchrony. Completely inadequate structure for modern system requirement, no particular industry support, and scant content: see Wikipedia for "The registry has not been updated since December 2001".) > > Mark > […]
RE: The Unicode Standard and ISO
> ... For another part it [sync with ISO/IEC 15897] failed because the > Consortium refused to cooperate, despite of repeated proposals for a merger of both instances. First, ISO/IEC 15897 is built on a data-format specification, ISO/IEC TR 14652, that never achieved the support needed to become an international standard, and has since been withdrawn. (TRs cannot remain TRs forever.) Now, JTC1/SC35 began work four or five years ago to create data-format specification for this, Approved Work Item 30112. From the outset, Unicode and the US national body tried repeatedly to engage with SC35 and SC35/WG5, informing them of UTS #35 (LDML) and CLDR, but were ignored. SC35 didn’t appear to be interested a pet project and not in what is actually being used in industry. After several failed attempts, Unicode and the USNB gave up trying. So, any suggestion that Unicode has failed to cooperate or is is dropping the ball with regard to locale data and ISO is simply uninformed. Peter From: Unicode On Behalf Of Mark Davis ?? via Unicode Sent: Thursday, June 7, 2018 6:20 AM To: Marcel Schneider Cc: UnicodeMailing Subject: Re: The Unicode Standard and ISO A few facts. > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could speak to the synchronization level in more detail, but the above statement is inaccurate. > ... For another part it [sync with ISO/IEC 15897] failed because the > Consortium refused to cooperate, despite of repeated proposals for a merger of both instances. I recall no serious proposals for that. (And in any event — very unlike the synchrony with 10646 and 14651 — ISO 15897 brought no value to the table. Certainly nothing to outweigh the considerable costs of maintaining synchrony. Completely inadequate structure for modern system requirement, no particular industry support, and scant content: see Wikipedia for "The registry has not been updated since December 2001".) Mark Mark On Thu, Jun 7, 2018 at 1:25 PM, Marcel Schneider via Unicode mailto:unicode@unicode.org>> wrote: On Thu, 17 May 2018 09:43:28 -0700, Asmus Freytag via Unicode wrote: > > On 5/17/2018 8:08 AM, Martinho Fernandes via Unicode wrote: > > Hello, > > > > There are several mentions of synchronization with related standards in > > unicode.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Funicode.org=02%7C01%7Cpetercon%40microsoft.com%7Cc82f0a9dd1564948d1fe08d5cc7aad2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636639749650227164=abUYeqt61H7FnzRXvJTy9NMmlk3ySvcMxyQ0bUDsNHc%3D=0>, > > e.g. in > > https://www.unicode.org/versions/index.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.unicode.org%2Fversions%2Findex.html=02%7C01%7Cpetercon%40microsoft.com%7Cc82f0a9dd1564948d1fe08d5cc7aad2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636639749650237177=jRgnBkmBfcoU9dMrawMXkSpCxLyqz4N6UBgWrg8UZ88%3D=0>, > > and > > https://www.unicode.org/faq/unicode_iso.html<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.unicode.org%2Ffaq%2Funicode_iso.html=02%7C01%7Cpetercon%40microsoft.com%7Cc82f0a9dd1564948d1fe08d5cc7aad2d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636639749650237177=n%2FQc61zUmDnDzSF%2F2mSIXiOeblqnrSs83zRxuKqnuqU%3D=0>. > > However, all such mentions > > never mention anything other than ISO 10646. > > Because that is the standard for which there is an explicit understanding by > all involved > relating to synchronization. There have been occasionally some challenging > differences > in the process and procedures, but generally the synchronization is being > maintained, > something that's helped by the fact that so many people are active in both > arenas. Perhaps the cause-effect relationship is somewhat unclear. I think that many people being active in both arenas is helped by the fact that there is a strong will to maintain synching. If there were similar policies notably for ISO/IEC 14651 (collation) and ISO/IEC 15897 (locale data), ISO/IEC 10646 would be far from standing alone in the field of Unicode-ISO/IEC cooperation. > > There are really no other standards where the same is true to the same extent. > > > > I was wondering which ISO standards other than ISO 10646 specify the > > same things as the Unicode Standard, and of those, which ones are > > actively kept in sync. This would be of importance for standardization > > of Unicode facilities in the C++ language (ISO 14882), as reference to > > ISO standards is generally preferred in ISO standards. > > > One of the areas the Unicode Standard differs from ISO 10646 is that its > conception > of a character's identity implicitly contains that character's properties
Re: The Unicode Standard and ISO
On Sat, 9 Jun 2018 21:21:40 -0700, Steven R. Loomis via Unicode wrote: > > Marcel, > The idea is not necessarily without merit. However, CLDR does not usually >expand scope just because of a suggestion. > > I usually recommend creating a new project first - gathering data, looking at > and talking to projects to ascertain the usefulness > of common messages.. one of the barriers to adding new content for CLDR is > not just the design, but collecting initial data. > When emoji or sub-territory names were added, many languages were included > before it was added to CLDR. We know it took years to collect the subterritory names and make sure the list and translations are complete. > > Also note CLDR does have some typographical terms for use in UI, such as > 'bold' and 'italic' I figure out that these are intended for tooltips on basic formatting facilities. High-end software like Microsoft Office has many more and adds tooltips showing instructions for use out of a corporate strategy that aims at raising usability and overall quality. So I wonder whether there are limits for software vendors in cooperating with competitors to mutualize UI content? This point and others would be cleared in the preliminary stage that you drafted above but that I don’t feel in a position to carry out, at least not now as I’m focusing on our national data in CLDR and on keyboard layouts and standards. Anyhow, Thank you for letting us know. Best regards, Marcel > Regards, > Steven > On Sat, Jun 9, 2018 at 3:41 PM Marcel Schneider via Unicode wrote: > > On Sat, 9 Jun 2018 12:56:28 -0700, Asmus Freytag via Unicode wrote: > > > > On 6/9/2018 12:01 PM, Marcel Schneider via Unicode wrote: > > > Still a computer should be understandable off-line, so CLDR providing a > > > standard library of error messages could be > > > appreciated by the industry > The kind of translations that CLDR accumulates, like day, and month names, > language and territory names, are a widely > > applicable subset and one that is commonly required in machine generated or > > machine-assembled text (like displaying > > the date, providing pick lists for configuration of locale settings, etc). > > The universe of possible error messages is a completely different beast. > > If you tried to standardize all error messages even in one language you > > would never arrive at something that would be > > universally useful. While some simple applications may find that all their > > needs for communicating with their users are > > covered, most would wish they had some other messages available. > > … > > > However, a high-quality terminology database recommends itself (and doesn't > > need any procurement standards). > > Ultimately, it was its demonstrated usefulness that drove the adoption of > > CLDR. > > This is why I’m so hopeful that CLDR will go much farther than date and time > and other locale settings, and emoji names and keywords. > >
Re: The Unicode Standard and ISO
On Sat, 9 Jun 2018 12:56:28 -0700, Asmus Freytag via Unicode wrote: […] > It's pushing this kind of impractical scheme that gives standardizers a bad > name. > > Especially if it is immediately tied to governmental procurement, forcing > people to adopt it (or live with it) > whether it provides any actual benefit. Or not. What I left untold is that governmental action does effectively work in both directions (examples following), but governments don’t own that lien of ambivalence out of unbalanced discretion. When the French NB positioned against encoding Œœ in ISO/IEC 8859-1:1986, it wasn’t the government but a manufacturer who wanted to get around adding support for this letter in printers. It’s not fully clear to me why the same happened to Dutch IJij. Anyway as a result we had (and legacy doing the rest, still have) two digitally malfunctioning languages. Thanks to the work of Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Luek Zeckendorf (ISO/IEC 6937:1983), and from 1987 on thanks to the work of Joe Becker, Lee Collins and Mark Davis from Apple and Xerox, things started working fine, and do work the longer the better thanks to Mark Davis’ on-going commitment. Industrial and governmental action both are ambivalent by nature simply because human action may happen to be short-viewed or far-sighted for a variety of reasons. When the French NB issued a QWERTY keyboard standard in 1973 and revised it in 1976, there were short-viewed industrial interests rather than governmental procurement. End-users never adopted it, there was no market, and it has recently been withdrawn. When governmental action, hard scientific work, human genius and an up-starting industrialization brought into existence a working keyboard for French that is usefully transposable to many other locales as well, it was enthousiastically adopted by the end-users and everybody urged the NB to standardize it. But the industry first asked for an international keyboard standard as a precondition… (which ended up being an excellent idea as well). The rest of the story may be spared as the conclusion is already clear. There is one impractical scheme that bothers me, and that is that we have two hyphens because the ASCII hyphen was duplicated as U+2010. Now since font designers (e.g. Lucida Sans Unicode) took the hyphen conundrum seriously to avoid spoofing, or for whatever reason, we’re supposed to have keyboard layouts with two hyphens, both being Gc=Pd. That is where the related ISO WG2 could have been useful by positioning against U+2010, because disambiguating the the minus sign U+2212 and keeping the hyphen-minus U+002D in use like e.g. the period would have been sufficient. On the other hand, it is entirely Unicode’s merit that we have two curly apostrophes, one that doesn’t break hashtags (U+02BC, Gc=Lm), and one that does (U+2019, Gc=Pf), as has been shared on this List (thanks to André Schappo). But despite a language being in a position to make a distinct use of each one of them, depending on whether the apostrophe helps denote a particular sound or marks an elision (and despite of having already a physical keyboard and driver that would make distinct entry very easy and straightforward), submitting feedback didn’t help to raise concern so far. This is an example how the industry and the governments united in the Unicode Consortium are saving end-users lots of trouble. Thank you. Marcel
Re: The Unicode Standard and ISO
Marcel, The idea is not necessarily without merit. However, CLDR does not usually expand scope just because of a suggestion. I usually recommend creating a new project first - gathering data, looking at and talking to projects to ascertain the usefulness of common messages.. one of the barriers to adding new content for CLDR is not just the design, but collecting initial data. When emoji or sub-territory names were added, many languages were included before it was added to CLDR. Also note CLDR does have some typographical terms for use in UI, such as 'bold' and 'italic' Regards, Steven On Sat, Jun 9, 2018 at 3:41 PM Marcel Schneider via Unicode < unicode@unicode.org> wrote: > On Sat, 9 Jun 2018 12:56:28 -0700, Asmus Freytag via Unicode wrote: > > > > On 6/9/2018 12:01 PM, Marcel Schneider via Unicode wrote: > > > Still a computer should be understandable off-line, so CLDR providing > a standard library of error messages could be > > > appreciated by the industry > The kind of translations that CLDR accumulates, like day, and month > names, language and territory names, are a widely > > applicable subset and one that is commonly required in machine generated > or machine-assembled text (like displaying > > the date, providing pick lists for configuration of locale settings, > etc). > > The universe of possible error messages is a completely different beast. > > If you tried to standardize all error messages even in one language you > would never arrive at something that would be > > universally useful. While some simple applications may find that all > their needs for communicating with their users are > > covered, most would wish they had some other messages available. > … > > > However, a high-quality terminology database recommends itself (and > doesn't need any procurement standards). > > Ultimately, it was its demonstrated usefulness that drove the adoption > of CLDR. > > This is why I’m so hopeful that CLDR will go much farther than date and > time and other locale settings, and emoji names and keywords. >
Re: The Unicode Standard and ISO
On Sat, 9 Jun 2018 12:56:28 -0700, Asmus Freytag via Unicode wrote: > > On 6/9/2018 12:01 PM, Marcel Schneider via Unicode wrote: > > Still a computer should be understandable off-line, so CLDR providing a > > standard library of error messages could be > > appreciated by the industry. > > The kind of translations that CLDR accumulates, like day, and month names, > language and territory names, are a widely > applicable subset and one that is commonly required in machine generated or > machine-assembled text (like displaying > the date, providing pick lists for configuration of locale settings, etc). > The universe of possible error messages is a completely different beast. > If you tried to standardize all error messages even in one language you would > never arrive at something that would be > universally useful. While some simple applications may find that all their > needs for communicating with their users are > covered, most would wish they had some other messages available. Indeed, error messages althouth technical are like the world’s books, a never-ending production of content. To account for this infinity, I was not proposing a closed set of messages to replace application libraries able to display message #123. In fact I wrote first: “If to date, automatic [automated] translation of technical English still does not work, then I’d suggest that CLDR feature a complete message library allowing to compose any localized piece of information.” Here the piece of information displayed by the application is like a Lego spacecraft, the CLDR messages like Lego bricks. I didn’t play with Lego since a very long time, but as a boy I learned how it works. I even remember that when building a construct, it often happened that some bricks were “missing”. A Lego box is complete wrt one or several models, but once my mom showing me the boxes on the shelves explained that they’re composed in a way that you’ll always lack something [when trying to build further]. — That doesn’t prevent Lego from thriving, nor many people from enjoying. > To adopt your scheme, they would need to have a bifurcated approach, where > some messages follow the standard, > while others do not (cannot). At that point, why bother? Determining whether > some message can be rewritten to follow > the standard adds another level of complexity while you'd need to have > translation resources for all the non-standard ones anyway. When CLDR libraries will allow to generate 98 % well-translated info boxes, human translators may focus on the remaining 2 %. If for any reason they cannot, yet the vendor will get much less support requests than with the ill-translated messages. > A middle ground is a shared terminology database that allows translators > working on different products to arrive at the same translation > for the same things. Translators already know how to use such databases in > their work flow, and integrating a shared one with > a product-specific one is much easier than trying to deal with a set of > random error messages. If the scheme you outline works well, where come the reported oddities from? Obviously terminology is not all, it’s like Lego bricks without studs: Terms alone don’t interlock and therefore the user cannot make sense. This is where CLDR’s hopefully on-coming localizable message bricks enter in action, helping automated translation software compose understandable output, using patterns. Google translate is unable to do that, as shown in the English and French translations of this sentence found in a page of the Finnish NB: https://www.sfs.fi/ajankohtaista/uutiset/nappaimistoon_tarjolla_lisayksia.4249.news Finnish: Kielitoimiston ohjeen mukaan esimerkiksi vieraskielisissä nimissä on pyrittävä säilyttämään kaikki tarkkeet. Google English: According to the Language Office, for example, in the name of a foreign language, it is necessary to maintain all the checkpoints. Google French: Selon le Language Office, par exemple, au nom d'une langue étrangère, il est nécessaire de maintenir tous les points de contrôle. > It's pushing this kind of impractical scheme that gives standardizers a bad > name. > > Especially if it is immediately tied to governmental procurement, forcing > people to adopt it (or live with it) whether it provides any actual benefit. These statements make much sense to me… > However, a high-quality terminology database recommends itself (and doesn't > need any procurement standards). > Ultimately, it was its demonstrated usefulness that drove the adoption of > CLDR. This is why I’m so hopeful that CLDR will go much farther than date and time and other locale settings, and emoji names and keywords. Best regards, Marcel
Re: The Unicode Standard and ISO
On 6/9/2018 12:01 PM, Marcel Schneider via Unicode wrote: Still a computer should be understandable off-line, so CLDR providing a standard library of error messages could be appreciated by the industry. The kind of translations that CLDR accumulates, like day, and month names, language and territory names, are a widely applicable subset and one that is commonly required in machine generated or machine-assembled text (like displaying the date, providing pick lists for configuration of locale settings, etc). The universe of possible error messages is a completely different beast. If you tried to standardize all error messages even in one language you would never arrive at something that would be universally useful. While some simple applications may find that all their needs for communicating with their users are covered, most would wish they had some other messages available. To adopt your scheme, they would need to have a bifurcated approach, where some messages follow the standard, while others do not (cannot). At that point, why bother? Determining whether some message can be rewritten to follow the standard adds another level of complexity while you'd need to have translation resources for all the non-standard ones anyway. A middle ground is a shared terminology database that allows translators working on different products to arrive at the same translation for the same things. Translators already know how to use such databases in their work flow, and integrating a shared one with a product-specific one is much easier than trying to deal with a set of random error messages. It's pushing this kind of impractical scheme that gives standardizers a bad name. Especially if it is immediately tied to governmental procurement, forcing people to adopt it (or live with it) whether it provides any actual benefit. However, a high-quality terminology database recommends itself (and doesn't need any procurement standards). Ultimately, it was its demonstrated usefulness that drove the adoption of CLDR. A./
RE: The Unicode Standard and ISO
On the other hand, most end-users don’t appreciate to get “a screenfull of all-in-English” when “something happened.” If even big companies still didn’t succeed in getting automatted computer translation to work for error messages, then best practice could eventually be to provide an internet link with every message. Given that web pages are generally less sibylline than error messages, they may be better translatable, and Philippe Verdy’s hint is therefore a working solution for localized software end-user support. Still a computer should be understandable off-line, so CLDR providing a standard library of error messages could be appreciated by the industry. Best regards, Marcel On Sat, 9 Jun 2018 18:14:17 +, Jonathan Rosenne via Unicode wrote: > > Translated error messages are a horror story. Often I have to play around > with my locale settings to avoid them. > Using computer translation on programming error messages is no way near to > being useful. > > Best Regards, > > Jonathan Rosenne > > From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe > Verdy via Unicode > Sent: Saturday, June 09, 2018 7:49 PM > To: Marcel Schneider > Cc: UnicodeMailingList > Subject: Re: The Unicode Standard and ISO 2018-06-09 17:22 GMT+02:00 Marcel Schneider via Unicode : On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote: > > > > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) > > Marcel Schneider via Unicode wrote: > > > > > > Where there is opportunity for productive sync and merging with is > > > > glibc. We have had some discussions, but more needs to be done- > > > > especially a lot of tooling work. Currently many bug reports are > > > > duplicated between glibc and cldr, a sort of manual > > > > synchronization. Help wanted here. > > > > > > Noted. For my part, sadly for C libraries I’m unlikely to be of any > > > help. > > > > I wonder how much of that comes under the sad category of "better not > > translated". If an English speaker has to resort to search engines to > > understand, let alone fix, a reported problem, it may be better for a > > non-English speaker to search for the error message in English, and then > > with luck he may find a solution he can understand. > > Then adding a "Display in English" button in the message box is best practice. > Still I’ve never encountered any yet, and I guess this is because such a > facility > would be understood as an admission that up to now, i18n is partly a failure. - Navigate any page on the web in another language than yours, with a Google Translate plugin enabled on your browser. you'll have the choice of seeing the automatic translation or the original. - Many websites that have pages proposed in multiple languages offers such buttons to select the language you want to see (and not necesarily falling back to English, becausse the original may as well be in another language and English is an approximate translation, notably for sites in Asia, Africa and south America). - Even the official websites of the European Union (or EEA) offers such choice (but at least the available translations are correctly reviewed for European languages; not all pages are translated in all official languages of member countries, but this is the case for most pages intended to be read by the general public, while pages about ongoing works, or technical reports for specialists, or recent legal decisions may not be translated except in a few "working languages", generally English, German, and French, sometimes Italian, the 4 languages spoken officially in multiple countries in the EEA including at least one in the European Union). So it's not a "failure" but a feature to be able to select the language, and to know when a proposed translation is fully or partly automated.
RE: The Unicode Standard and ISO
Translated error messages are a horror story. Often I have to play around with my locale settings to avoid them. Using computer translation on programming error messages is no way near to being useful. Best Regards, Jonathan Rosenne From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Philippe Verdy via Unicode Sent: Saturday, June 09, 2018 7:49 PM To: Marcel Schneider Cc: UnicodeMailingList Subject: Re: The Unicode Standard and ISO 2018-06-09 17:22 GMT+02:00 Marcel Schneider via Unicode mailto:unicode@unicode.org>>: On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote: > > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) > Marcel Schneider via Unicode > mailto:unicode@unicode.org>> wrote: > > > > Where there is opportunity for productive sync and merging with is > > > glibc. We have had some discussions, but more needs to be done- > > > especially a lot of tooling work. Currently many bug reports are > > > duplicated between glibc and cldr, a sort of manual > > > synchronization. Help wanted here. > > > > Noted. For my part, sadly for C libraries I’m unlikely to be of any > > help. > > I wonder how much of that comes under the sad category of "better not > translated". If an English speaker has to resort to search engines to > understand, let alone fix, a reported problem, it may be better for a > non-English speaker to search for the error message in English, and then > with luck he may find a solution he can understand. Then adding a "Display in English" button in the message box is best practice. Still I’ve never encountered any yet, and I guess this is because such a facility would be understood as an admission that up to now, i18n is partly a failure. - Navigate any page on the web in another language than yours, with a Google Translate plugin enabled on your browser. you'll have the choice of seeing the automatic translation or the original. - Many websites that have pages proposed in multiple languages offers such buttons to select the language you want to see (and not necesarily falling back to English, becausse the original may as well be in another language and English is an approximate translation, notably for sites in Asia, Africa and south America). - Even the official websites of the European Union (or EEA) offers such choice (but at least the available translations are correctly reviewed for European languages; not all pages are translated in all official languages of member countries, but this is the case for most pages intended to be read by the general public, while pages about ongoing works, or technical reports for specialists, or recent legal decisions may not be translated except in a few "working languages", generally English, German, and French, sometimes Italian, the 4 languages spoken officially in multiple countries in the EEA including at least one in the European Union). So it's not a "failure" but a feature to be able to select the language, and to know when a proposed translation is fully or partly automated.
Re: The Unicode Standard and ISO
2018-06-09 17:22 GMT+02:00 Marcel Schneider via Unicode : > On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote: > > > > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) > > Marcel Schneider via Unicode wrote: > > > > > > Where there is opportunity for productive sync and merging with is > > > > glibc. We have had some discussions, but more needs to be done- > > > > especially a lot of tooling work. Currently many bug reports are > > > > duplicated between glibc and cldr, a sort of manual > > > > synchronization. Help wanted here. > > > > > > Noted. For my part, sadly for C libraries I’m unlikely to be of any > > > help. > > > > I wonder how much of that comes under the sad category of "better not > > translated". If an English speaker has to resort to search engines to > > understand, let alone fix, a reported problem, it may be better for a > > non-English speaker to search for the error message in English, and then > > with luck he may find a solution he can understand. > > Then adding a "Display in English" button in the message box is best > practice. > Still I’ve never encountered any yet, and I guess this is because such a > facility > would be understood as an admission that up to now, i18n is partly a > failure. - Navigate any page on the web in another language than yours, with a Google Translate plugin enabled on your browser. you'll have the choice of seeing the automatic translation or the original. - Many websites that have pages proposed in multiple languages offers such buttons to select the language you want to see (and not necesarily falling back to English, becausse the original may as well be in another language and English is an approximate translation, notably for sites in Asia, Africa and south America). - Even the official websites of the European Union (or EEA) offers such choice (but at least the available translations are correctly reviewed for European languages; not all pages are translated in all official languages of member countries, but this is the case for most pages intended to be read by the general public, while pages about ongoing works, or technical reports for specialists, or recent legal decisions may not be translated except in a few "working languages", generally English, German, and French, sometimes Italian, the 4 languages spoken officially in multiple countries in the EEA including at least one in the European Union). So it's not a "failure" but a feature to be able to select the language, and to know when a proposed translation is fully or partly automated.
Re: The Unicode Standard and ISO
On Sat, 9 Jun 2018 09:47:01 +0100, Richard Wordingham via Unicode wrote: > > On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) > Marcel Schneider via Unicode wrote: > > > > Where there is opportunity for productive sync and merging with is > > > glibc. We have had some discussions, but more needs to be done- > > > especially a lot of tooling work. Currently many bug reports are > > > duplicated between glibc and cldr, a sort of manual > > > synchronization. Help wanted here. > > > > Noted. For my part, sadly for C libraries I’m unlikely to be of any > > help. > > I wonder how much of that comes under the sad category of "better not > translated". If an English speaker has to resort to search engines to > understand, let alone fix, a reported problem, it may be better for a > non-English speaker to search for the error message in English, and then > with luck he may find a solution he can understand. Then adding a "Display in English" button in the message box is best practice. Still I’ve never encountered any yet, and I guess this is because such a facility would be understood as an admission that up to now, i18n is partly a failure. > In a related vein, > one hears reports of people using English as the interface language, > because they can't understand the messages allegedly in their native > language. If to date, automatic translation of technical English still does not work, then I’d suggest that CLDR feature a complete message library allowing to compose any localized piece of information. But such an attempt requires that all available human resources really focus on the project, instead of being diverted by interpersonal discordances. Sulking people around a project are an indicator of poor project management branding dissenters as enemies out of an inability to behave in a diplomatic way by lack of social skills. At least that’s what they’d teach you in any management school. The way Unicode behaves against William Overington is in my opinion a striking example of mismanagement. In one dimension I can see, the "localizable sentences" that William invented and that he actively promotes do fit exactly into the scheme of localizable information elements suggested in the preceding paragraph. I strongly recommend that instead of publicly blacklisting the author in the mailbox of the president and directing the List moderation to prohibit the topic as out of scope of Unicode, an extensible and flexible framework be designed in urgency under the Unicode‐CLDR umbrella to put an end to the pseudo‐localization that Richard pointed above. OK I’m lacking diplomatic skills too, and this e‐mail is harsh, but I see it as a true echo. And I apologize for my last reply to William Overington, if I need to. http://www.unicode.org/mail-arch/unicode-ml/y2018-m03/0118.html Beside that, I’d suggest also to add a CLDR library of character name elements allowing to compose every existing Unicode character name in all supported locales, for use in system character pickers and special character dialogs. This library should then be updated at each major release of the UCS. Hopefully this library is then flexible enough to avoid any Standardese, be it in English, in French, or in any language aping English Standardese. E.g. when the ISO/IEC 10646 mirror of Unicode was published in an official French version, the official translators felt partly committed to ape English Standardese, of which we know that it isn’t due mainly to Unicode, but to the then‐head of ISO/IEC JTC1 SC2 WG2. Not to warm up that old grudge, just to show how on‐topic that is. Be it Standardese or pseudo‐ localization, the effect is always to worsen UX by missing the point. Best regards, Marcel
Re: The Unicode Standard and ISO
I just see the WG2 as a subcomity where governements may just check their practices and make minimum recommendations. Most governements are in fact very late to adopt the industry standards that evolve fast, and they just want to reduce the frequency of necessary changes jsut to enterinate what seems to be stable enough and gives them long enough period to plan the transitions. So ISO 10646 has had in fact very few updates compared to Unicode (even if these Unicode changes were "synchronized", most of them remained for long within optional amendments that are then synchronized in ISO 10646 long after the inbdustry has started working on updating their code for Unicode and made checks to ensure that it is stable enough to be finally included in ISO 10646 later as the new minimal platform that governments can reasonnably ask to be provided by their providers in the industry at reasonnable (or no) additional cost. So I see now ISO 646 only as a small subset of the Unicode standard. The WG2 technical comity is jsut there to finally approve what can be endorsed as a standard whose usage is made mandatory in governments, when the UTS itself is still (and will remain) just optional (not a requirement). It takes months or years to have new TUS features being available on all platforms that governements use. WG2 probably does not focus really on technical merits, but just evaluating the implementation and deployment costs, and that's where the WG2 members decide what is reasonnable for them to adopt (let's also not forget that ISO standards are mapped to national standards that reference it normatively, and these national standards (or European standards in the EEA) are legal requirements: governements then no longer need to specify each time which requirement they want, they're just saying that the national standards within a certain class are required for all product/service offers, and failure to implement theses standards will require those providers to fix their products at no additional cost, and independantly of the contractual or subscribed period of support). 2018-06-08 23:28 GMT+02:00 Marcel Schneider via Unicode : > On Fri, 8 Jun 2018 13:33:20 -0700, Asmus Freytag via Unicode wrote: > > > […] > > There's no value added in creating "mirrors" of something that is > successfully being developed and maintained under a different umbrella. > > Wouldn’t the same be true for ISO/IEC 10646? It has no value added > neither, and WG2 meetings could be merged with UTC meetings. > Unicode maintains the entire chain, from the roadmap to the production > tool (that the Consortium ordered without paying a full license). > > But the case is about part of the people who are eager to maintain an > alternate forum, whereas the industry (i.e. the main users of the data) > are interested in fast‐tracking character batches, and thus tend to > shortcut the ISO/IEC JTC1 SC2 WG2. This is proof enough that applying > the same logic than to ISO/IEC 15897, WG2 would be eliminated. The reason > why it was not, is that Unicode was weaker and needed support > from ISO/IEC to gain enough traction, despite the then‐ISO/IEC 10646 being > useless in practice, as it pursued an unrealistic encoding scheme. > To overcome this, somebody in ISO started actively campaigning for the > Unicode encoding model, encountering fierce resistance from fellow > ISO people until he succeeded in teaching them real‐life computing. He had > already invented and standardized the sorting method later used > to create UCA and ISO/IEC 14651. I don’t believe that today everybody > forgot about him. > > Marcel > >
Re: The Unicode Standard and ISO
On Sat, 9 Jun 2018 08:23:33 +0200 (CEST) Marcel Schneider via Unicode wrote: > > Where there is opportunity for productive sync and merging with is > > glibc. We have had some discussions, but more needs to be done- > > especially a lot of tooling work. Currently many bug reports are > > duplicated between glibc and cldr, a sort of manual > > synchronization. Help wanted here. > > Noted. For my part, sadly for C libraries I’m unlikely to be of any > help. I wonder how much of that comes under the sad category of "better not translated". If an English speaker has to resort to search engines to understand, let alone fix, a reported problem, it may be better for a non-English speaker to search for the error message in English, and then with luck he may find a solution he can understand. In a related vein, one hears reports of people using English as the interface language, because they can't understand the messages allegedly in their native language. Richard.
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 09:20:09 -0700, Steven R. Loomis via Unicode wrote: […] > But, it sounds like the CLDR process was successful in this case. Thank you >for contributing. You are welcome, but thanks are due to the actual corporate contributors. […] > Actually, I think the particular data item you found is relatively new. The > first values entered > for it in any language were May 18th of this year. Were there votes for > "keycap" earlier? The "keycap" category is found as soon as in v30 (released 2016-10-05). > Rather than a tracer finding evidence of neglect, you are at the forefront of > progressing the translated data for French. Congratulations! The neglect is on my part as I neglected to check the data history. Please note that I did not make accusations of neglect. Again: The historic Code Charts translators, partly still active, sulk CLDR because Unicode is perceived as sulking ISO/IEC 15897, so that minimal staff is actively translating CLDR for the French locale and can legitimately feel forsaken. I even made detailed suppositions as of how it could happen that "keycap" remained untranslated. […] [Unanswered questions (please refer to my other e‐mails in this thread)] > The registry for ISO/IEC 15897 has neither data for French, nor structure > that would translate the term "Characters | Category | Label | keycap". > So there would be nothing to merge with there. Correct. The only data for French is an ISO/IEC 646 charset: http://std.dkuug.dk/cultreg/registrations/number/156 As far as I can see there are available data to merge for Danish, Faroese, Finnish Greenlandic, Norwegian, and Swedish. > So, historically, CLDR began not a part of Unicode, but as part of Li18nx > under the Free Standards Group. See the bottom of the page > http://cldr.unicode.org/index/acknowledgments > "The founding members of the workgroup were IBM, Sun and OpenOffice.org". > What we were trying to do was to provide internationalized content for Linux, > and also, to resolve the then-disparity between locale data > across platforms. Locale data was very divergent between platforms - spelling > and word choice changes, etc. Comparisons were done > and a Common locale data repository (with its attendant XML formats) > emerged. That's the C in CLDR. Seed data came from IBM’s ICIR > which dates many decades before 15897 (example > http://www.computinghistory.org.uk/det/13342/IBM-National-Language-Support-Reference-Manual-Volume-2/ > - 4th edition published in 1994.) 100 locales we contributed to glibc as well. Thank you for the account and resources. The Linux Internationalization Initiative appears to have issued a last release on August 23, 2000: https://www.redhat.com/en/about/press-releases/83 the year before ISO/IEC 15897 was lastly updated: http://std.dkuug.dk/cultreg/registrations/chreg.htm > Where there is opportunity for productive sync and merging with is glibc. We > have had some discussions, but more needs to be > done- especially a lot of tooling work. Currently many bug reports are > duplicated between glibc and cldr, a sort of manual synchronization. > Help wanted here. Noted. For my part, sadly for C libraries I’m unlikely to be of any help. Marcel
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 20:45:26 +0200 Philippe Verdy via Unicode wrote: > 2018-06-08 19:41 GMT+02:00 Richard Wordingham via Unicode < > unicode@unicode.org>: > The way tailoring is designed in CLDR using only data used by a > generic algorithm, and not custom algorithm is not the only way to > collate Lao. You can perectly add new custom algorithm promitives > that will use new collation data rules that can be inserted as > "hooks" in UCA (which provides several points at which it is > possible, but UCA just makes these hooks act as "no-op". The ideal is to have a common library rather than add specific routines to support specific languages. Now, this can be done in a common library; ICU break iterators have dedicated routines for CJK and for Siamese. I wonder if this could be done for Lao and possibly Tai Lue. I've a vague recollection that UCA collation for Tai Lue in the New Tai Lue script only needs thousands of contractions, so it may work well enough in the main CLDR collation algorithm. Martin Hosken provided the numbers, probably on the Unicore list, when New Tai Lue formally switched from phonetic to visual order. Taking the definition of logical order literally, the change legitimised the logical order of New Tai Lue. > You can be much faster is you create a specific library for Lao, that > would still be able to process the basic collation rules and then > make more advanced inferences based on larger cluster boundaries than > just those considered in the standard basic UCA, so it is perfectly > possible to extend it to cover more complex Lao syllables and various > specific quirks (such as hyphenation in the middle of clusters, as > seen in some Indic scripts using left matras). How is this hyphenation done? The answer probably belongs in the thread entitled 'Hyphenation Markup', unless its restricted to the visual order scripts. If it's occurring in the visual order scripts, we may need to add contractions for ; U+00AD breaks contractions, and, indeed, may be used for exactly that purpose, as it is generally easier to type than CGJ. While I've seen line-breaking after a left matra in Thai, I've never *seen* a hyphen after a left matra. Richard.
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 14:14:51 -0700 "Steven R. Loomis via Unicode" wrote: > > But the consortium has formally dropped the commitment to DUCET in > > CLDR. Even when restricted to strings of assigned characters, the > > CLDR and ICU no longer make the effort to support the DUCET > > collation. > CLDR is not a collation implementation, it is a data repository with > associated specification. It was never required to 'support' DUCET. > The contents of CLDR have no bearing on whether implementations > support DUCET. DUCET used to be the root collation of CLDR. > CLDR ≠ ICU. DUCET is a standard collation. Language-specific collations are stored in CLDR, so why not an international standard? Does ICU store collations not defined in CLDR? The formal snag is that the collations have to be LDML tailorings of the CLDR root collation, which is a formal problem for U+FDD0. I would expect you to argue that it is more useful for U+FDD0 to have the special behaviour defined in CLDR, and restrict conformance with DUCET to characters other than non-characters. > On Fri, Jun 8, 2018 at 10:41 AM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > On Fri, 8 Jun 2018 13:40:21 +0200 > > Mark Davis ☕️ wrote: > > > > The UCA contains features essential for respecting canonical > > > > equivalence. ICU works hard to avoid the extra effort involved, > > > > apparently even going to the extreme of implicitly declaring > > > > that Vietnamese is not a human language. > > > A bit over the top, eh? > > > > Then remove the "no known language" from the bug list > What does this refer to? http://userguide.icu-project.org/collation/customization Under the heading "Known Limitations" it says: "The following are known limitations of the ICU collation implementation. These are theoretical limitations, however, since there are no known languages for which these limitations are an issue. However, for completeness they should be fixed in a future version after 1.8.1. The examples given are designed for simplicity in testing, and do not match any real languages." Then, the particular problem is listed under the heading "Contractions Spanning Normalization". The assumption is that FCD strings do not need to be decomposed. This comes unstuck when what is locally a secondary weight due to a diacritic on a vowel has to be promoted to a primary weight to support syllable by syllable collation in a system not set up for such a tiered comparison. > > …ICU isn't > > fast enough to load a collation from customisation - it takes > > hours! > > ICU is, alas, ridiculously slow > I'm also curious what this refers to, perhaps it should be a separate > ICU bug? There may be reproducibility issues. A proper bug report will take some work. There's also the argument that nearly 200,000 contractions is excessive. I had to disable certain checks that were treating "should not" as a prohibition - working round them either exceeded ICU's capacity because of the necessary increase in the number of contractions, or was incompatible with the design of the collation. The weight customisation creates 45 new weights, with lines like "&\u0EA1 = \ufdd2\u0e96 < \ufdd2\u0e97 # MO for THO_H & THO_L" I use strings like \ufdd2\u0e96 to emulate ISO/IEC 14651 (primary) weights. I carefully reuse default Lao weights so as to keep collating elements' list of collation elements short. There are a total of 187174 non-comment lines, most being simple contractions like "&\u0ec8\ufdd2\u0e96\ufdd2AAW\ufdd3\u0e94 = \u0ec8\u0e96\u0ead\u0e94 # 1+K+AW+N N is mandatory!" and prefix contractions like "&\ufdd2AAW\ufdd3\u0e81\u0ec9 = \u0e96\u0ec9 | ອ\u0e81 # K+1|ອ+N N is mandatory". I strip the comments off as I convert the collation definition to UTF-16; if I remember correctly I also have to convert escape sequences to characters. That processing is a negligible part of the time. By comparison, the loading of 30,000 lines from allkeys.txt is barely discernible. The generation of the loading of the collation was reasonably fast when I generated DUCET-style collation weights using bash. For my purposes, I would get better performance if ICU's collation just blindly converted strings to NFD, but then all I am using it for is to compare collation rules against a dictionary. I suspect it's just that I lose out massively as a result of ICU's tradeoffs. Richard.
Re: The Unicode Standard and ISO
On 6/8/2018 2:28 PM, Marcel Schneider via Unicode wrote: On Fri, 8 Jun 2018 13:33:20 -0700, Asmus Freytag via Unicode wrote: […] There's no value added in creating "mirrors" of something that is successfully being developed and maintained under a different umbrella. Wouldn’t the same be true for ISO/IEC 10646? It has no value added neither, and WG2 meetings could be merged with UTC meetings. Unicode maintains the entire chain, from the roadmap to the production tool (that the Consortium ordered without paying a full license). Without going into a lot of historical detail, the situations are not comparable; I don't think I agree to the way you summarize things here, but unfortunately I have not the time to elaborate further. It suffices to note that 10646 was and is a special case. Not every attempt at standardization has to happen at ISO. Even on a treaty level there have always been other organizations, for example ITU. Almost the worst thing you can do is duplicating an existing and well-established effort (by which I mean not a paper effort, but one that is being implemented widely). Doing so just adds needless complexity, but it will always satisfy people who are engaging in the kind of turf-war that makes them feel important. A./ But the case is about part of the people who are eager to maintain an alternate forum, whereas the industry (i.e. the main users of the data) are interested in fast‐tracking character batches, and thus tend to shortcut the ISO/IEC JTC1 SC2 WG2. This is proof enough that applying the same logic than to ISO/IEC 15897, WG2 would be eliminated. The reason why it was not, is that Unicode was weaker and needed support from ISO/IEC to gain enough traction, despite the then‐ISO/IEC 10646 being useless in practice, as it pursued an unrealistic encoding scheme. To overcome this, somebody in ISO started actively campaigning for the Unicode encoding model, encountering fierce resistance from fellow ISO people until he succeeded in teaching them real‐life computing. He had already invented and standardized the sorting method later used to create UCA and ISO/IEC 14651. I don’t believe that today everybody forgot about him. Marcel
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 16:54:20 -0400, Tom Gewecke via Unicode wrote: > > > On Jun 8, 2018, at 9:52 AM, Marcel Schneider via Unicode wrote: > > > > People relevant to projects for French locale do trace the borderline of > > applicability wider > > than do those people who are closerly tied to Unicode‐related projects. > > Could you give a concrete example or two of what these people mean by “wider > borderline of applicability” > that might generate their ethical dilemma? > Drawing the borderline until which ISO/IEC should be among the involved parties, as I put it, is about the Unicode policy as of how ISO/IEC JTC1 SC2 WG2 is involved in the process, how it appears in public (FAQs, Mailing List responding practice, and so on), and how people in that WG2 feel with respect to Unicode. That may be different depending on the standard concerned (ISO/IEC 10646, ISO/IEC 14651), so that the former is put in the first place as vital to Unicode, while the latter is almost entirely hidden (except in appendix B of UTS #10). Then when it’s up to locale data, Unicode people see the borderline below, while ISO people tend to see it above. This is why Unicode people do not want the twin‐standards‐bodies‐principle applied to locale data, and are ignoring or declining any attempt to equalize situations, arguing that ISO/IEC 15897 is useless. As I’ve pointed in my previous e‐mail responding to Asmus Freytag, ISO/IEC 10646 was about as useless until Unicode came on it and merged itself with that UCS embryo (not to say that miscarriage on the way). The only thing WG2 could insist upon were names and huge bunches of precomposed or preformatted characters that Unicode was designed to support in plain text by other means. The essential part was Unicode’s, and without Unicode we wouldn’t have any usable UCS. ISO/IEC 15897 appears to be in a similar position: not very useful, not very performative, not very complete. But an ISO/IEC standard. Logically, Unicode should feel committed to merge with it the same way it did with the other standard, maintaining the data, and publishing periodical abstracts under ISO coverage. There is no problem in publishing a framework standard under the ISO/IEC umbrella, associated with a regular up‐to‐date snapshot of the data. That is what I mean when I say that Unicode arbitrarily draw borderlines of their own, regardless of how people at ISO feel about them. Marcel
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 13:33:20 -0700, Asmus Freytag via Unicode wrote: > […] > There's no value added in creating "mirrors" of something that is > successfully being developed and maintained under a different umbrella. Wouldn’t the same be true for ISO/IEC 10646? It has no value added neither, and WG2 meetings could be merged with UTC meetings. Unicode maintains the entire chain, from the roadmap to the production tool (that the Consortium ordered without paying a full license). But the case is about part of the people who are eager to maintain an alternate forum, whereas the industry (i.e. the main users of the data) are interested in fast‐tracking character batches, and thus tend to shortcut the ISO/IEC JTC1 SC2 WG2. This is proof enough that applying the same logic than to ISO/IEC 15897, WG2 would be eliminated. The reason why it was not, is that Unicode was weaker and needed support from ISO/IEC to gain enough traction, despite the then‐ISO/IEC 10646 being useless in practice, as it pursued an unrealistic encoding scheme. To overcome this, somebody in ISO started actively campaigning for the Unicode encoding model, encountering fierce resistance from fellow ISO people until he succeeded in teaching them real‐life computing. He had already invented and standardized the sorting method later used to create UCA and ISO/IEC 14651. I don’t believe that today everybody forgot about him. Marcel
Re: The Unicode Standard and ISO
Richard, > But the consortium has formally dropped the commitment to DUCET in CLDR. > Even when restricted to strings of assigned characters, the > CLDR and ICU no longer make the effort to support the DUCET > collation. CLDR is not a collation implementation, it is a data repository with associated specification. It was never required to 'support' DUCET. The contents of CLDR have no bearing on whether implementations support DUCET. CLDR ≠ ICU. On Fri, Jun 8, 2018 at 10:41 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Fri, 8 Jun 2018 13:40:21 +0200 > Mark Davis ☕️ wrote: > > > > The UCA contains features essential for respecting canonical > > > equivalence. ICU works hard to avoid the extra effort involved, > > > apparently even going to the extreme of implicitly declaring that > > > Vietnamese is not a human language. > > > A bit over the top, eh? > > Then remove the "no known language" from the bug list > What does this refer to? > > …ICU isn't > fast enough to load a collation from customisation - it takes hours! … > ICU is, alas, ridiculously slow > I'm also curious what this refers to, perhaps it should be a separate ICU bug?
Re: The Unicode Standard and ISO
> On Jun 8, 2018, at 9:52 AM, Marcel Schneider via Unicode > wrote: > > People relevant to projects for French locale do trace the borderline of > applicability wider > than do those people who are closerly tied to Unicode‐related projects. Could you give a concrete example or two of what these people mean by “wider borderline of applicability” that might generate their ethical dilemma?
Re: The Unicode Standard and ISO
On 6/8/2018 5:01 AM, Michael Everson via Unicode wrote: and achieving a fullscale merger with ISO/IEC 15897, after which the valid data stay hosted entirely in CLDR, and ISO/IEC 15897 would be its ISO mirror. I wonder if Mark Davis will be quick to agree with me when I say that ISO/IEC 15897 has no use and should be withdrawn I don't know about Mark, but that would have been my position. There's no value added in creating "mirrors" of something that is successfully being developed and maintained under a different umbrella. A./
Re: The Unicode Standard and ISO
2018-06-08 19:41 GMT+02:00 Richard Wordingham via Unicode < unicode@unicode.org>: > On Fri, 8 Jun 2018 13:40:21 +0200 > Mark Davis ☕️ wrote: > > > Mark > > > > On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < > > unicode@unicode.org> wrote: > > > > > On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) > > > Marcel Schneider via Unicode wrote: > > > > > > > Thank you for confirming. All witnesses concur to invalidate the > > > > statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — > > > > After being invented in its actual form, sorting was standardized > > > > simultaneously in ISO/IEC 14651 and in Unicode Collation > > > > Algorithm, the latter including practice‐oriented extra > > > > features. > > > > > > The UCA contains features essential for respecting canonical > > > equivalence. ICU works hard to avoid the extra effort involved, > > > apparently even going to the extreme of implicitly declaring that > > > Vietnamese is not a human language. > > > A bit over the top, eh? > > Then remove the "no known language" from the bug list, or declare that > you don't know SE Asian languages. > > The root problem is that the UCA cannot handle syllable by syllable > comparisons; if the UCA could handle that, the correct collation of > unambiguous true Lao would become simple. The CLDR algorithm provides > just enough memory to make Lao collation possible; however, ICU isn't > fast enough to load a collation from customisation - it takes hours! > One could probably do better if one added suffix contractions, but > adding that capability might be nightmare. The way tailoring is designed in CLDR using only data used by a generic algorithm, and not custom algorithm is not the only way to collate Lao. You can perectly add new custom algorithm promitives that will use new collation data rules that can be inserted as "hooks" in UCA (which provides several points at which it is possible, but UCA just makes these hooks act as "no-op". You can be much faster is you create a specific library for Lao, that would still be able to process the basic collation rules and then make more advanced inferences based on larger cluster boundaries than just those considered in the standard basic UCA, so it is perfectly possible to extend it to cover more complex Lao syllables and various specific quirks (such as hyphenation in the middle of clusters, as seen in some Indic scripts using left matras). Not everything has to be specified by UCA itself notably if it's specific to a script (or sometimes only a single locale, i.e. a specific combination of a script, language, orthographic convention, and stylistic convention for some kinds of documents or presentations).
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 13:40:21 +0200 Mark Davis ☕️ wrote: > Mark > > On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) > > Marcel Schneider via Unicode wrote: > > > > > Thank you for confirming. All witnesses concur to invalidate the > > > statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — > > > After being invented in its actual form, sorting was standardized > > > simultaneously in ISO/IEC 14651 and in Unicode Collation > > > Algorithm, the latter including practice‐oriented extra > > > features. > > > > The UCA contains features essential for respecting canonical > > equivalence. ICU works hard to avoid the extra effort involved, > > apparently even going to the extreme of implicitly declaring that > > Vietnamese is not a human language. > A bit over the top, eh? Then remove the "no known language" from the bug list, or declare that you don't know SE Asian languages. The root problem is that the UCA cannot handle syllable by syllable comparisons; if the UCA could handle that, the correct collation of unambiguous true Lao would become simple. The CLDR algorithm provides just enough memory to make Lao collation possible; however, ICU isn't fast enough to load a collation from customisation - it takes hours! One could probably do better if one added suffix contractions, but adding that capability might be nightmare. > I'm guessing you mean https://unicode.org/cldr/trac/ticket/10868, > which nicely outlines a proposal for dealing with a number of > problems with Vietnamese. It still includes a brute force work-around. > We clearly don't support every sorting feature that various > dictionaries and agencies come up with. Sometimes it is because we > can't (yet) see a good way to do it: >1. it might be not determinant: many governmental standards or > style sheets require "interesting" sorting, such as determining that > "XI" is a roman numeral (not the president of China) and sorting as > 11, or when "St." is meant to be Street *and* when meant to be Saint > (St. Stephen's St.) I believe the first is a character identity issue. Some of us see the difference between U+0058 LATIN CAPITAL LETTER X and the discouraged U+2169 ROMAN NUMERAL TEN as more than just a round-tripping difference. For example, by hand, I write the 'V' in 'Henry V' with a regnal number quite differently to 'Henry V.' where 'V' is short for a name. > > > Since then, > > > these two standards are kept in synchrony uninterruptedly. > > But the consortium has formally dropped the commitment to DUCET in > > CLDR. Even when restricted to strings of assigned characters, the > > CLDR and ICU no longer make the effort to support the DUCET > > collation. Indeed, I'm not even sure that the DUCET is a tailoring > > of the root CLDR collation, even when restricted to assigned > > characters. Tailorings tend to have odd side effects; fortunately, > > they rarely if ever matter. CLDR root is a rewrite with > > modifications of DUCET; it has changes that are prohibited as > > 'tailorings'! > CLDR does make some tailorings to the DUCET to create its root > collation, notably adding special contractions of private use > characters to allow for tailoring support and indexes [ > http://unicode.org/reports/tr35/tr35-collation.html#File_Format_FractionalUCA_txt > ] plus the rearrangement of some characters (mostly punctuation and > symbols) to allow runtime parametric reordering of groups of > characters (eg to put numbers after letters) [ > http://unicode.org/reports/tr35/tr35-collation.html#grouping_classes_of_characters > ]. My main point is that for practical purposes (i.e. ICU), Unicode has moved away from ISO/IEC 14651. The difference is small. I didn't say that there weren't good reasons. >- If there are other changes that are not well documented, or if > you think those features are causing problems in some way, please > file a ticket. Well, I don't have to use DUCET, though I've found it easier for unmaintainable tailorings. I need to write code to apply non-parametric LDML tailorings - ICU is, alas, ridiculously slow. I hope that's just a matter of optimisation balance between compiling a tailoring and applying it. Are there any published compliance tests for non-parametric tailorings? I'm not sure how one would check that an alleged parametric reordering of numbers and letters applied to a tailoring of DUCET was in accordance with the LDML definition, but I don't think you want to expend money sorting that out. >- If there is a particular change that you think is not conformant > to UCA, please also file that. Sorry, I must have scanned the conformance requirements too quickly. I had got it into my head that someone had recklessly required that tailorings being in accordance with LDML. That constraint only applies to parametric tailorings, so any properly structured unambiguously
Re: The Unicode Standard and ISO
Marcel, On Fri, Jun 8, 2018 at 6:52 AM, Marcel Schneider via Unicode < unicode@unicode.org> wrote: > > What got me started is that before even I requested a submitter ID (and > the reason why I’ve requested one), > "Characters | Category | Label | keycap" remained untranslated, i.e. its > French translation was "keycap". > When I proposed "cabochon", the present contributors kindly upvoted or > proposed "touche" even before I > launched a forum thread, and when I got aware, I changed my vote and > posted the rationale on the forum, > so the upvoting contributor kindly followed so that now we stay united for > "touche", rather than "keycap". > But, it sounds like the CLDR process was successful in this case. Thank you for contributing. > Please note that I acknowledge everybody and don’t criticize anybody. It > doesn’t require much imagination > to figure out that when CLDR was set up, there were so few or even no > French contributors that translating > "keycap" either fell out of deadline or was overlooked or whatever, and > later passed unnoticed. That is a > tracer detecting that none of the people setting up the French translation > of the Code Charts were ever on > the CLDR project. Because if anybody of them had been active on CLDR, no > English word would have been > kept in use mistakenly for the French locale. > Actually, I think the particular data item you found is relatively new. The first values entered for it in any language were May 18th of this year. Were there votes for "keycap" earlier? Rather than a tracer finding evidence of neglect, you are at the forefront of progressing the translated data for French. Congratulations! > French contributors are not "prevented from cooperating". Where do you get this from? Who do you mean? > > Historic French contributors are ethically prevented from contributing to > CLDR, because of a strong commitment to involve ISO/IEC, > a notion that is very meaningful to Unicode. People relevant to projects > for French locale do trace the borderline of applicability wider > than do those people who are closerly tied to Unicode‐related projects. Which contributors specifically are prevented? > > There were not "many attempts" at a merger, and Unicode didn't "refuse" > anything. Who do you think "attempted", and when? > > An influential person consistently campaigned for a merger of CLDR and > ISO/IEC 15897, but that never succeeded. It’s unlikely to be ignored. Which person? > Albeit given the state of ISO/IEC 15897, there was nothing such a merger > would have contributed anyway. > > I’ve took a glance at the data of ISO/IEC 15897 and cannot figure out that > there is nothing to pick from. At least they won’t be disposed to > sell you "keycap" as a French term or as being in any use in that target > locale. And anyhow, the gesture would be appreciated as a piece > of good diplomacy. Hopefully a lightweight proceeding could end up in that > data being transferred to CLDR, and this being cited as sole > normative reference in ISO/IEC 15897. As a result, everybody’s happy. > The registry for ISO/IEC 15897 has neither data for French, nor structure that would translate the term "Characters | Category | Label | keycap". So there would be nothing to merge with there. So, historically, CLDR began not a part of Unicode, but as part of Li18nx under the Free Standards Group. See the bottom of the page http://cldr.unicode.org/index/acknowledgments "The founding members of the workgroup were IBM, Sun and OpenOffice.org". What we were trying to do was to provide internationalized content for Linux, and also, to resolve the then-disparity between locale data across platforms. Locale data was very divergent between platforms - spelling and word choice changes, etc. Comparisons were done and a Common locale data repository (with its attendant XML formats) emerged. That's the C in CLDR. Seed data came from IBM’s ICIR which dates many decades before 15897 (example http://www.computinghistory.org.uk/det/13342/IBM-National-Language-Support-Reference-Manual-Volume-2/ - 4th edition published in 1994.) 100 locales we contributed to glibc as well. Where there is opportunity for productive sync and merging with is glibc. We have had some discussions, but more needs to be done- especially a lot of tooling work. Currently many bug reports are duplicated between glibc and cldr, a sort of manual synchronization. Help wanted here. Steven
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 08:50:28 -0400, Tom Gewecke via Unicode wrote: > > > > On Jun 7, 2018, at 11:32 PM, Marcel Schneider via Unicode wrote: > > > > What bothered me ... is that the registration of the French locale in CLDR > > is > > still surprisingly incomplete > > Could you provide an example or two? > What got me started is that "Characters | Category | Label | keycap" remained untranslated, i.e. its French translation was "keycap". A number of keyword translations are missing or wrong. I can tell that all actual contributors are working hard to fix the issues. I can imagine that it’s by lack of time in front of the huge mass of data, or by feeling so alone (only three corporate contributors, no liaison or NGOs). No wonder if the official French translators are all sulking the job (reportedly, not me figuring out). Marcel
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 13:06:18 +0200, Mark Davis ☕️ via Unicode wrote: > > Where are you getting your "facts"? Among many unsubstantiated or ambiguous > claims in that very long sentence: > > > "French locale in CLDR is still surprisingly incomplete". > > For each release, the data collected for the French locale is complete to the > bar we have set for Level=Modern. What got me started is that before even I requested a submitter ID (and the reason why I’ve requested one), "Characters | Category | Label | keycap" remained untranslated, i.e. its French translation was "keycap". When I proposed "cabochon", the present contributors kindly upvoted or proposed "touche" even before I launched a forum thread, and when I got aware, I changed my vote and posted the rationale on the forum, so the upvoting contributor kindly followed so that now we stay united for "touche", rather than "keycap". Please note that I acknowledge everybody and don’t criticize anybody. It doesn’t require much imagination to figure out that when CLDR was set up, there were so few or even no French contributors that translating "keycap" either fell out of deadline or was overlooked or whatever, and later passed unnoticed. That is a tracer detecting that none of the people setting up the French translation of the Code Charts were ever on the CLDR project. Because if anybody of them had been active on CLDR, no English word would have been kept in use mistakenly for the French locale. Beyond what everybody on this List is able to decrypt on his or her own, I’m not in a position to disclose any further personal information, for witness protection’s sake. > What you may mean is that CLDR doesn't support a structure that you think it > should. > For that, you have to make a compelling case that the structure you propose > is worth it, worth diverting people from other priorities. Thank you, that is not a problem and may be resolved after filing a ticket, which would be done for a later release, given that top priority tasks require a potentially huge amount of work. First NBSP and NNBSP need to be added to the French charset (see http://unicode.org/cldr/trac/ticket/11120 ). Adding centuries to Date (with French short form "s.") is of interest for any locale, but irrelevant to everyday business practice. > > French contributors are not "prevented from cooperating". Where do you get > this from? Who do you mean? Historic French contributors are ethically prevented from contributing to CLDR, because of a strong commitment to involve ISO/IEC, a notion that is very meaningful to Unicode. People relevant to projects for French locale do trace the borderline of applicability wider than do those people who are closerly tied to Unicode‐related projects. > > We have many French contribute data over time. When finding the word "keycap" as a French translation of "keycap" in my copy of CLDR data at home, I wanted to know who contributed that data. I was told that when survey is open, I’ll see who is contributing. I won’t blame those who are helping resolve the issue now. > Now, it works better when people engage under the umbrella of an > organization, but even there that doesn't have to be a company; > we have liaison relationships with government agencies and NGOs. That’s fine. But even as a guest I’m well received, and anyhow the point is to bring the arguments. My concern is that starting with a good translation from scratch is more efficient than attempting to correct the same error(s) across multiple instances via the survey tool, that seems to be designed to fix small errors rather than to redesign entire parts of the scheme. > > There were not "many attempts" at a merger, and Unicode didn't "refuse" > anything. Who do you think "attempted", and when? An influential person consistently campaigned for a merger of CLDR and ISO/IEC 15897, but that never succeeded. It’s unlikely to be ignored. > > Albeit given the state of ISO/IEC 15897, there was nothing such a merger > would have contributed anyway. I’ve took a glance at the data of ISO/IEC 15897 and cannot figure out that there is nothing to pick from. At least they won’t be disposed to sell you "keycap" as a French term or as being in any use in that target locale. And anyhow, the gesture would be appreciated as a piece of good diplomacy. Hopefully a lightweight proceeding could end up in that data being transferred to CLDR, and this being cited as sole normative reference in ISO/IEC 15897. As a result, everybody’s happy. > BTW, your use of the term "refuse" might be a language issue. I don't > "refuse" to respond > to the widow of a Nigerian Prince who wants to give me $1M. Since I don't > think it is worth my time, > or am not willing to upfront the low, low fee of $10K, I might "ignore" the > email, or "not respond" to it. > Or I might "decline" it with a no-thanks or not-interested response. But none > of that is to "refuse"
Re: The Unicode Standard and ISO
> On Jun 7, 2018, at 11:32 PM, Marcel Schneider via Unicode > wrote: > > What bothered me ... is that the registration of the French locale in CLDR is > still surprisingly incomplete Could you provide an example or two?
Re: The Unicode Standard and ISO
On 8 June 2018 at 13:01, Michael Everson via Unicode wrote: > > I wonder if Mark Davis will be quick to agree with me when I say that > ISO/IEC 15897 has no use and should be withdrawn. It was reviewed and confirmed in 2017, so the next systematic review won't be until 2022. And as the standard is now under SC35, national committees mirroring SC2 may well overlook (or be unable to provide feedback to) the systematic review when it next comes around. I agree that ISO/IEC 15897 has no use, and should be withdrawn. Andrew
Re: The Unicode Standard and ISO
On 8 Jun 2018, at 04:32, Marcel Schneider via Unicode wrote: > the registration of the French locale in CLDR is still surprisingly > incomplete despite the meritorious efforts made by the actual contributors Nothing prevents people from working to complete the French locale in CLDR. Synchronization with an unused ISO standard is not necessary to do that. Michael Everson
Re: The Unicode Standard and ISO
On 7 Jun 2018, at 20:13, Marcel Schneider via Unicode wrote: > On Fri, 18 May 2018 00:29:36 +0100, Michael Everson via Unicode responded: >> >> It would be great if mutual synchronization were considered to be of benefit. >> Some of us in SC2 are not happy that the Unicode Consortium has published >> characters >> which are still under Technical ballot. And this did not happen only once. > > I’m not happy catching up this thread out of time, the less as it ultimately > brings me where I’ve started > in 2014/2015: to the wrong character names that the ISO/IEC 10646 merger > infiltrated into Unicode. Many things have more than one name. The only truly bad misnomers from that period was related to a mapping error, namely, in the treatment of Latvian characters which are called CEDILLA rather than COMMA BELOW. > This is the very thing I did not vent in my first reply. From my point of > view, this misfortune would be > reason enough for Unicode not to seek further cooperation with ISO/IEC. This is absolutely NOT what we want. What we want is for the two parties to remember that industrial concerns and public concerns work best together. > But I remember the many voices raising on this List to tell me that this is > all over and forgiven. I think you are digging up an old grudge that nobody thinks about any longer. > Therefore I’m confident that the Consortium will have the mindfulness to > complete the ISO/IEC JTC 1 > partnership by publicly assuming synchronization with ISO/IEC 14651, There is no trouble with ISO/IEC 14651. > and achieving a fullscale merger with ISO/IEC 15897, after which the valid > data stay hosted entirely in CLDR, and ISO/IEC 15897 would be its ISO mirror. I wonder if Mark Davis will be quick to agree with me when I say that ISO/IEC 15897 has no use and should be withdrawn. Michael Everson
Re: The Unicode Standard and ISO
Mark On Fri, Jun 8, 2018 at 10:06 AM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) > Marcel Schneider via Unicode wrote: > > > Thank you for confirming. All witnesses concur to invalidate the > > statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — > > After being invented in its actual form, sorting was standardized > > simultaneously in ISO/IEC 14651 and in Unicode Collation Algorithm, > > the latter including practice‐oriented extra features. > > The UCA contains features essential for respecting canonical > equivalence. ICU works hard to avoid the extra effort involved, > apparently even going to the extreme of implicitly declaring that > Vietnamese is not a human language. A bit over the top, eh? > (Some contractions are not > supported by ICU!) I'm guessing you mean https://unicode.org/cldr/trac/ticket/10868, which nicely outlines a proposal for dealing with a number of problems with Vietnamese. We clearly don't support every sorting feature that various dictionaries and agencies come up with. Sometimes it is because we can't (yet) see a good way to do it: 1. it might be not determinant: many governmental standards or style sheets require "interesting" sorting, such as determining that "XI" is a roman numeral (not the president of China) and sorting as 11, or when "St." is meant to be Street *and* when meant to be Saint (St. Stephen's St.) 2. the prospective cost in memory, code complexity, or performance, or the time necessary to figure out to do complex requirements, doesn't seem to warrant adding it at this point. Now, if you or others are interested in proposing specific patches to address certain issues, then you can propose that. Best to make a proposal (ticket) before doing the work, because if the solution is very intricate, even the time necessary to evaluate the patch can be too much to fit into the schedule. For that reason, it is best to break up such tickets into small, tractable pieces. The synchronisation is manifest in the DUCET > collation, which seems to make the effort to ensure that some canonical > equivalent will sort the same way under ISO/IEC 14651. > > > Since then, > > these two standards are kept in synchrony uninterruptedly. > > But the consortium has formally dropped the commitment to DUCET in > CLDR. Even when restricted to strings of assigned characters, the CLDR > and ICU no longer make the effort to support the DUCET collation. > Indeed, I'm not even sure that the DUCET is a tailoring of the root CLDR > collation, even when restricted to assigned characters. Tailorings > tend to have odd side effects; fortunately, they rarely if ever matter. > CLDR root is a rewrite with modifications of DUCET; it has changes that > are prohibited as 'tailorings'! > CLDR does make some tailorings to the DUCET to create its root collation, notably adding special contractions of private use characters to allow for tailoring support and indexes [ http://unicode.org/reports/tr35/tr35-collation.html#File_Format_FractionalUCA_txt ] plus the rearrangement of some characters (mostly punctuation and symbols) to allow runtime parametric reordering of groups of characters (eg to put numbers after letters) [ http://unicode.org/reports/tr35/tr35-collation.html#grouping_classes_of_characters ]. - If there are other changes that are not well documented, or if you think those features are causing problems in some way, please file a ticket. - If there is a particular change that you think is not conformant to UCA, please also file that. > Richard. > >
Re: The Unicode Standard and ISO
Where are you getting your "facts"? Among many unsubstantiated or ambiguous claims in that very long sentence: 1. "French locale in CLDR is still surprisingly incomplete". 1. For each release, the data collected for the French locale is complete to the bar we have set for Level=Modern. 2. What you may mean is that CLDR doesn't support a structure that you think it should. For that, you have to make a compelling case that the structure you propose is worth it, worth diverting people from other priorities. 2. French contributors are not "prevented from cooperating". Where do you get this from? Who do you mean? 1. We have many French contribute data over time. Now, it works better when people engage under the umbrella of an organization, but even there that doesn't have to be a company; we have liaison relationships with government agencies and NGOs. 3. There were not "many attempts" at a merger, and Unicode didn't "refuse" anything. Who do you think "attempted", and when? 1. Albeit given the state of ISO/IEC 15897, there was nothing such a merger would have contributed anyway. 2. BTW, your use of the term "refuse" might be a language issue. I don't "refuse" to respond to the widow of a Nigerian Prince who wants to give me $1M. Since I don't think it is worth my time, or am not willing to upfront the low, low fee of $10K, I might "ignore" the email, or "not respond" to it. Or I might "decline" it with a no-thanks or not-interested response. But none of that is to "refuse" it. Mark On Fri, Jun 8, 2018 at 5:32 AM, Marcel Schneider via Unicode < unicode@unicode.org> wrote: > On Thu, 7 Jun 2018 22:46:12 +0300, Erkki I. Kolehmainen via Unicode wrote: > > > > I cannot but fully agree with Mark and Michael. > > > > Sincerely > > > > Thank you for confirming. All witnesses concur to invalidate the statement > about > uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — After being invented in > its > actual form, sorting was standardized simultaneously in ISO/IEC 14651 and > in > Unicode Collation Algorithm, the latter including practice‐oriented extra > features. > Since then, these two standards are kept in synchrony uninterruptedly. > > Getting people to correct the overall response was not really my initial > concern, > however. What bothered me before I learned that Unicode refuses to > cooperate > with ISO/IEC JTC1 SC22 is that the registration of the French locale in > CLDR is > still surprisingly incomplete despite the meritorious efforts made by the > actual > contributors, and then after some investigation, that the main part of the > potential > French contributors are prevented from cooperating because Unicode refuses > to > cooperate with ISO/IEC on locale data while ISO/IEC 15897 predates CLDR, > reportedly after many attempts made to merge both standards, remaining > unsuccessful without any striking exposure or friendly agreement to avoid > kind of > an impression of unconcerned rebuff. > > Best regards, > > Marcel > >
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 05:32:51 +0200 (CEST) Marcel Schneider via Unicode wrote: > Thank you for confirming. All witnesses concur to invalidate the > statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — > After being invented in its actual form, sorting was standardized > simultaneously in ISO/IEC 14651 and in Unicode Collation Algorithm, > the latter including practice‐oriented extra features. The UCA contains features essential for respecting canonical equivalence. ICU works hard to avoid the extra effort involved, apparently even going to the extreme of implicitly declaring that Vietnamese is not a human language. (Some contractions are not supported by ICU!) The synchronisation is manifest in the DUCET collation, which seems to make the effort to ensure that some canonical equivalent will sort the same way under ISO/IEC 14651. > Since then, > these two standards are kept in synchrony uninterruptedly. But the consortium has formally dropped the commitment to DUCET in CLDR. Even when restricted to strings of assigned characters, the CLDR and ICU no longer make the effort to support the DUCET collation. Indeed, I'm not even sure that the DUCET is a tailoring of the root CLDR collation, even when restricted to assigned characters. Tailorings tend to have odd side effects; fortunately, they rarely if ever matter. CLDR root is a rewrite with modifications of DUCET; it has changes that are prohibited as 'tailorings'! Richard.
Re: The Unicode Standard and ISO
On Fri, 8 Jun 2018 00:43:04 +0200, Philippe Verdy via Unicode wrote: [cited mail] > > The "normative names" are in fact normative only as a forward reference > to the ISO/IEC repertoire becaus it insists that these names are essential > part > of the stable encoding policy which was then integrated in the Unicode > stability rules, > so that the normative reference remains stable as well). Beside this, Unicode > has other > more useful properties. People don't care at all about these names. Effectively we have learned to live even with those that are uselessly misleading and had been pushed through against better proposals made on Unicode side, particularly the wrong left/right attributes. Unicode have worked hard to palliate these misnomers by introducing the bidi_bracket (yes, no) and bidi_bracket_type (open, close) properties, and specifying in TUS that beside a few exceptions, LEFT and RIGHT in names of paired punctuation is to be read as OPENING and CLOSING, respectively. > The character properties and the related algorithms that use them (and even > the representative glyph even if it's not stabilized) are much more important > (and the ISO/IEC 101646 does not do anything to solve the real encoding > issues, > and needed properties for correct processing). Unicode is more based on > commonly > used practices and allows experimetnation and progressive enhancing without > having > to break the agreed ISO/EIC normative properties. The position of Unicode is > more > pragmatic, and is much more open to lot of contibutors than the small ISO/IEC > subcomities > with in fact very few active members, but it's still an interesting > counter-power that allows > governments to choose where it is more useful to contribute and have > influence when > the industry may have different needs and practices not foàllowing the > government > recommendations adopted at ISO. Now it becomes clear to me that this opportunity of governmental action is exactly what could be useful when it’s up to fix the textual appearance of national user interfaces, and that is exactly why not federating communities around CLDR, and not attempting to get efforts converge, is so counter‐productive. Thanks for getting this point out. Best regards, Marcel
RE: The Unicode Standard and ISO
On Thu, 7 Jun 2018 22:46:12 +0300, Erkki I. Kolehmainen via Unicode wrote: > > I cannot but fully agree with Mark and Michael. > > Sincerely > Thank you for confirming. All witnesses concur to invalidate the statement about uniqueness of ISO/IEC 10646 ‐ Unicode synchrony. — After being invented in its actual form, sorting was standardized simultaneously in ISO/IEC 14651 and in Unicode Collation Algorithm, the latter including practice‐oriented extra features. Since then, these two standards are kept in synchrony uninterruptedly. Getting people to correct the overall response was not really my initial concern, however. What bothered me before I learned that Unicode refuses to cooperate with ISO/IEC JTC1 SC22 is that the registration of the French locale in CLDR is still surprisingly incomplete despite the meritorious efforts made by the actual contributors, and then after some investigation, that the main part of the potential French contributors are prevented from cooperating because Unicode refuses to cooperate with ISO/IEC on locale data while ISO/IEC 15897 predates CLDR, reportedly after many attempts made to merge both standards, remaining unsuccessful without any striking exposure or friendly agreement to avoid kind of an impression of unconcerned rebuff. Best regards, Marcel
Re: The Unicode Standard and ISO
2018-06-07 21:13 GMT+02:00 Marcel Schneider via Unicode : > On Thu, 17 May 2018 22:26:15 +, Peter Constable via Unicode wrote: > […] > > Hence, from an ISO perspective, ISO 10646 is the only standard for which > on-going > > synchronization with Unicode is needed or relevant. > > This point of view is fueled by the Unicode Standard being traditionally > thought of as a mere character set, > regardless of all efforts—lastly by first responder Asmus Freytag > himself—to widen the conception. > > On Fri, 18 May 2018 00:29:36 +0100, Michael Everson via Unicode responded: > > > > It would be great if mutual synchronization were considered to be of > benefit. > > Some of us in SC2 are not happy that the Unicode Consortium has > published characters > > which are still under Technical ballot. And this did not happen only > once. > > I’m not happy catching up this thread out of time, the less as it > ultimately brings me where I’ve started > in 2014/2015: to the wrong character names that the ISO/IEC 10646 merger > infiltrated into Unicode. > This is the very thing I did not vent in my first reply. From my point of > view, this misfortune would be > reason enough for Unicode not to seek further cooperation with ISO/IEC. > The "normative names" are in fact normative only as a forward reference to the ISO/IEC repertoire becaus it insists that these names are essential part of the stable encoding policy which was then integrated in the Unicode stability rules, so that the normative reference remains stable as well). Beside this, Unicode has other more useful properties. People don't care at all about these names. The character properties and the related algorithms that use them (and even the representative glyph even if it's not stabilized) are much more important (and the ISO/IEC 101646 does not do anything to solve the real encoding issues, and needed properties for correct processing). Unicode is more based on commonly used practices and allows experimetnation and progressive enhancing without having to break the agreed ISO/EIC normative properties. The position of Unicode is more pragmatic, and is much more open to lot of contibutors than the small ISO/IEC subcomities with in fact very few active members, but it's still an interesting counter-power that allows governments to choose where it is more useful to contribute and have influence when the industry may have different needs and practices not foàllowing the government recommendations adopted at ISO.
RE: The Unicode Standard and ISO
I cannot but fully agree with Mark and Michael. Sincerely Erkki I. Kolehmainen Mannerheimintie 75 B 37, 00270 Helsinki, Finland Mob: +358 400 825 943 -Alkuperäinen viesti- Lähettäjä: Unicode Puolesta Michael Everson via Unicode Lähetetty: torstai 7. kesäkuuta 2018 16.29 Vastaanottaja: unicode Unicode Discussion Aihe: Re: The Unicode Standard and ISO On 7 Jun 2018, at 14:20, Mark Davis ☕️ via Unicode wrote: > > A few facts. > >> > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. > > ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could > speak to the synchronization level in more detail, but the above statement is > inaccurate. Mark is right. >> > ... For another part it [sync with ISO/IEC 15897] failed because the >> > Consortium refused to cooperate, despite of repeated proposals for a >> > merger of both instances. > > I recall no serious proposals for that. Nor do I. > (And in any event — very unlike the synchrony with 10646 and 14651 — ISO > 15897 brought no value to the table. Certainly nothing to outweigh the > considerable costs of maintaining synchrony. Completely inadequate structure > for modern system requirement, no particular industry support, and scant > content: see Wikipedia for "The registry has not been updated since December > 2001”.) Mark is right. Michael Everson
Re: The Unicode Standard and ISO
On Thu, 17 May 2018 22:26:15 +, Peter Constable via Unicode wrote: […] > Hence, from an ISO perspective, ISO 10646 is the only standard for which > on-going > synchronization with Unicode is needed or relevant. This point of view is fueled by the Unicode Standard being traditionally thought of as a mere character set, regardless of all efforts—lastly by first responder Asmus Freytag himself—to widen the conception. On Fri, 18 May 2018 00:29:36 +0100, Michael Everson via Unicode responded: > > It would be great if mutual synchronization were considered to be of benefit. > Some of us in SC2 are not happy that the Unicode Consortium has published > characters > which are still under Technical ballot. And this did not happen only once. I’m not happy catching up this thread out of time, the less as it ultimately brings me where I’ve started in 2014/2015: to the wrong character names that the ISO/IEC 10646 merger infiltrated into Unicode. This is the very thing I did not vent in my first reply. From my point of view, this misfortune would be reason enough for Unicode not to seek further cooperation with ISO/IEC. But I remember the many voices raising on this List to tell me that this is all over and forgiven. Therefore I’m confident that the Consortium will have the mindfulness to complete the ISO/IEC JTC 1 partnership by publicly assuming synchronization with ISO/IEC 14651, and achieving a fullscale merger with ISO/IEC 15897, after which the valid data stay hosted entirely in CLDR, and ISO/IEC 15897 would be its ISO mirror. That is a matter of smart diplomacy, that Unicode may prove again to be great in. Please consider making this move. Thanks, Marcel
Re: The Unicode Standard and ISO
On Thu, 7 Jun 2018 15:20:29 +0200, Mark Davis ☕️ via Unicode wrote: > > A few facts. > > > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. > > ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could > speak to the > synchronization level in more detail, but the above statement is inaccurate. > > > ... For another part it [sync with ISO/IEC 15897] failed because the > > Consortium refused to > > cooperate, despite of repeated proposals for a merger of both instances. > > I recall no serious proposals for that. > > (And in any event — very unlike the synchrony with 10646 and 14651 — ISO > 15897 brought > no value to the table. Certainly nothing to outweigh the considerable costs > of maintaining synchrony. > Completely inadequate structure for modern system requirement, no particular > industry support, and > scant content: see Wikipedia for "The registry has not been updated since > December 2001".) Thank you for correcting as of the Unicode ISO/IEC 14651 synchrony; indeed while on http://www.unicode.org/reports/tr10/#Synch_ISO14651 we can read that “This relationship between the two standards is similar to that maintained between the Unicode Standard and ISO/IEC 10646[,]” confusingly there seems to be no related FAQ. Even more confusingly, a straightforward question like “I was wondering which ISO standards other than ISO 10646 specify the same things as the Unicode Standard” remains ultimately unanswered. The reason might be that the “and of those, which ones are actively kept in sync” part is really best answered by “none.” In fact, while UCA is synched with ISO/IEC 14651, the reverse statement is reportedly false. Hence, UCA would be what is called an implementation of ISO/IEC 14651. Nevertheless, UAX #10 refers to “The synchronized version of ISO/IEC 14651[,]” and mentions a “common tool[.]” Hence one simple question: Why does the fact that the Unicode-ISO synchrony encompasses *two* standards remain untold in the first places? As of ISO/IEC 15897, it would certainly be a piece of good diplomacy that Unicode pick the usable data in the existing set, and then ISO/IEC 15897 will be in a position to cite CLDR as a normative reference so that all potential contributors are redirected and may feel free to contribute to CLDR. And it would be nice that Unicode don’t forget to order an additional FAQ about the topic, please. Thanks, Marcel
Re: The Unicode Standard and ISO
On 7 Jun 2018, at 14:20, Mark Davis ☕️ via Unicode wrote: > > A few facts. > >> > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. > > ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could > speak to the synchronization level in more detail, but the above statement is > inaccurate. Mark is right. >> > ... For another part it [sync with ISO/IEC 15897] failed because the >> > Consortium refused to cooperate, despite of repeated proposals for a >> > merger of both instances. > > I recall no serious proposals for that. Nor do I. > (And in any event — very unlike the synchrony with 10646 and 14651 — ISO > 15897 brought no value to the table. Certainly nothing to outweigh the > considerable costs of maintaining synchrony. Completely inadequate structure > for modern system requirement, no particular industry support, and scant > content: see Wikipedia for "The registry has not been updated since December > 2001”.) Mark is right. Michael Everson
Re: The Unicode Standard and ISO
A few facts. > ... Consortium refused till now to synchronize UCA and ISO/IEC 14651. ISO/IEC 14651 and Unicode have longstanding cooperation. Ken Whistler could speak to the synchronization level in more detail, but the above statement is inaccurate. > ... For another part it [sync with ISO/IEC 15897] failed because the Consortium refused to cooperate, despite of repeated proposals for a merger of both instances. I recall no serious proposals for that. (And in any event — very unlike the synchrony with 10646 and 14651 — ISO 15897 brought no value to the table. Certainly nothing to outweigh the considerable costs of maintaining synchrony. Completely inadequate structure for modern system requirement, no particular industry support, and scant content: see Wikipedia for "The registry has not been updated since December 2001".) Mark Mark On Thu, Jun 7, 2018 at 1:25 PM, Marcel Schneider via Unicode < unicode@unicode.org> wrote: > On Thu, 17 May 2018 09:43:28 -0700, Asmus Freytag via Unicode wrote: > > > > On 5/17/2018 8:08 AM, Martinho Fernandes via Unicode wrote: > > > Hello, > > > > > > There are several mentions of synchronization with related standards in > > > unicode.org, e.g. in https://www.unicode.org/versions/index.html, and > > > https://www.unicode.org/faq/unicode_iso.html. However, all such > mentions > > > never mention anything other than ISO 10646. > > > > Because that is the standard for which there is an explicit > understanding by all involved > > relating to synchronization. There have been occasionally some > challenging differences > > in the process and procedures, but generally the synchronization is > being maintained, > > something that's helped by the fact that so many people are active in > both arenas. > > Perhaps the cause-effect relationship is somewhat unclear. I think that > many people being > active in both arenas is helped by the fact that there is a strong will to > maintain synching. > > If there were similar policies notably for ISO/IEC 14651 (collation) and > ISO/IEC 15897 > (locale data), ISO/IEC 10646 would be far from standing alone in the field > of > Unicode-ISO/IEC cooperation. > > > > > There are really no other standards where the same is true to the same > extent. > > > > > > I was wondering which ISO standards other than ISO 10646 specify the > > > same things as the Unicode Standard, and of those, which ones are > > > actively kept in sync. This would be of importance for standardization > > > of Unicode facilities in the C++ language (ISO 14882), as reference to > > > ISO standards is generally preferred in ISO standards. > > > > > One of the areas the Unicode Standard differs from ISO 10646 is that its > conception > > of a character's identity implicitly contains that character's > properties - and those are > > standardized as well and alongside of just name and serial number. > > This is probably why, to date, ISO/IEC 10646 features character properties > by including > normative references to the Unicode Standard, Standard Annexes, and the > UCD. > Bidi-mirroring e.g. is part of ISO/IEC 10646 that specifies in clause 15.1: > > “[…] The list of these characters is determined by having the > ‘Bidi_Mirrored’ property > set to ‘Y’ in the Unicode Standard. These values shall be determined > according to > the Unicode Standard Bidi Mirrored property (see Clause 2).” > > > > > Many of these properties have associated with them algorithms, e.g. the > bidi algorithm, > > that are an essential element of data interchange: if you don't know > which order in > > the backing store is expected by the recipient to produce a certain > display order, you > > cannot correctly prepare your data. > > > > There is one area where standardization in ISO relates to work in > Unicode that I can > > think of, and that is sorting. > > Yet UCA conforms to ISO/IEC 14651 (where UCA is cited as entry #28 in the > bibliography). > The reverse relationship is irrelevant and would be unfair, given that the > Consortium > refused till now to synchronize UCA and ISO/IEC 14651. > > Here is a need for action. > > > However, sorting, beyond the underlying framework, > > ultimately relates to languages, and language-specific data is now > housed in CLDR. > > > > Early attempts by ISO to standardize a similar framework for locale data > failed, in > > part because the framework alone isn't the interesting challenge for a > repository, > > instead it is the collection, vetting and management of the data. > > For another part it failed because the Consortium refused to cooperate, > despite of > repeated proposals for a merger of both instances. > > > > > The reality is that the ISO model and its organizational structures are > not well suited > > to the needs of many important area where some form of standardization > is needed. > > That's why we have organization like IETF, W3C, Unicode etc.. > > > > Duplicating all or even part of their effort inside ISO really serves > nobody's
Re: The Unicode Standard and ISO
On Thu, 17 May 2018 09:43:28 -0700, Asmus Freytag via Unicode wrote: > > On 5/17/2018 8:08 AM, Martinho Fernandes via Unicode wrote: > > Hello, > > > > There are several mentions of synchronization with related standards in > > unicode.org, e.g. in https://www.unicode.org/versions/index.html, and > > https://www.unicode.org/faq/unicode_iso.html. However, all such mentions > > never mention anything other than ISO 10646. > > Because that is the standard for which there is an explicit understanding by > all involved > relating to synchronization. There have been occasionally some challenging > differences > in the process and procedures, but generally the synchronization is being > maintained, > something that's helped by the fact that so many people are active in both > arenas. Perhaps the cause-effect relationship is somewhat unclear. I think that many people being active in both arenas is helped by the fact that there is a strong will to maintain synching. If there were similar policies notably for ISO/IEC 14651 (collation) and ISO/IEC 15897 (locale data), ISO/IEC 10646 would be far from standing alone in the field of Unicode-ISO/IEC cooperation. > > There are really no other standards where the same is true to the same extent. > > > > I was wondering which ISO standards other than ISO 10646 specify the > > same things as the Unicode Standard, and of those, which ones are > > actively kept in sync. This would be of importance for standardization > > of Unicode facilities in the C++ language (ISO 14882), as reference to > > ISO standards is generally preferred in ISO standards. > > > One of the areas the Unicode Standard differs from ISO 10646 is that its > conception > of a character's identity implicitly contains that character's properties - > and those are > standardized as well and alongside of just name and serial number. This is probably why, to date, ISO/IEC 10646 features character properties by including normative references to the Unicode Standard, Standard Annexes, and the UCD. Bidi-mirroring e.g. is part of ISO/IEC 10646 that specifies in clause 15.1: “[…] The list of these characters is determined by having the ‘Bidi_Mirrored’ property set to ‘Y’ in the Unicode Standard. These values shall be determined according to the Unicode Standard Bidi Mirrored property (see Clause 2).” > > Many of these properties have associated with them algorithms, e.g. the bidi > algorithm, > that are an essential element of data interchange: if you don't know which > order in > the backing store is expected by the recipient to produce a certain display > order, you > cannot correctly prepare your data. > > There is one area where standardization in ISO relates to work in Unicode > that I can > think of, and that is sorting. Yet UCA conforms to ISO/IEC 14651 (where UCA is cited as entry #28 in the bibliography). The reverse relationship is irrelevant and would be unfair, given that the Consortium refused till now to synchronize UCA and ISO/IEC 14651. Here is a need for action. > However, sorting, beyond the underlying framework, > ultimately relates to languages, and language-specific data is now housed in > CLDR. > > Early attempts by ISO to standardize a similar framework for locale data > failed, in > part because the framework alone isn't the interesting challenge for a > repository, > instead it is the collection, vetting and management of the data. For another part it failed because the Consortium refused to cooperate, despite of repeated proposals for a merger of both instances. > > The reality is that the ISO model and its organizational structures are not > well suited > to the needs of many important area where some form of standardization is > needed. > That's why we have organization like IETF, W3C, Unicode etc.. > > Duplicating all or even part of their effort inside ISO really serves > nobody's purpose. An undesirable side-effect of not merging Unicode with ISO/IEC 15897 (locale data) is to divert many competent contributors from monitoring CLDR data, especially for French. Here too is a huge need for action. Thanks in advance. Marcel
Re: The Unicode Standard and ISO
It would be great if mutual synchronization were considered to be of benefit. Some of us in SC2 are not happy that the Unicode Consortium has published characters which are still under Technical ballot. And this did not happen only once. > On 17 May 2018, at 23:26, Peter Constable via Unicode> wrote: > > Hence, from an ISO perspective, ISO 10646 is the only standard for which > on-going synchronization with Unicode is needed or relevant.
RE: The Unicode Standard and ISO
ISO character encoding standards are primarily focused on identifying a repertoire of character elements and their code point assignments in some encoding form. ISO developed other, legacy character-encoding standards in the past, but has not done so for over 20 years. All of those legacy standards can be mapped as a bijection to ISO 10646; in regard to character repertoires, they are all proper subsets of ISO 10646. Hence, from an ISO perspective, ISO 10646 is the only standard for which on-going synchronization with Unicode is needed or relevant. Peter -Original Message- From: UnicodeOn Behalf Of Martinho Fernandes via Unicode Sent: Thursday, May 17, 2018 8:08 AM To: unicode@unicode.org Subject: The Unicode Standard and ISO Hello, There are several mentions of synchronization with related standards in unicode.org, e.g. in https://www.unicode.org/versions/index.html, and https://www.unicode.org/faq/unicode_iso.html. However, all such mentions never mention anything other than ISO 10646. I was wondering which ISO standards other than ISO 10646 specify the same things as the Unicode Standard, and of those, which ones are actively kept in sync. This would be of importance for standardization of Unicode facilities in the C++ language (ISO 14882), as reference to ISO standards is generally preferred in ISO standards. -- Martinho
Re: The Unicode Standard and ISO
On 5/17/2018 8:08 AM, Martinho Fernandes via Unicode wrote: Hello, There are several mentions of synchronization with related standards in unicode.org, e.g. in https://www.unicode.org/versions/index.html, and https://www.unicode.org/faq/unicode_iso.html. However, all such mentions never mention anything other than ISO 10646. Because that is the standard for which there is an explicit understanding by all involved relating to synchronization. There have been occasionally some challenging differences in the process and procedures, but generally the synchronization is being maintained, something that's helped by the fact that so many people are active in both arenas. There are really no other standards where the same is true to the same extent. I was wondering which ISO standards other than ISO 10646 specify the same things as the Unicode Standard, and of those, which ones are actively kept in sync. This would be of importance for standardization of Unicode facilities in the C++ language (ISO 14882), as reference to ISO standards is generally preferred in ISO standards. One of the areas the Unicode Standard differs from ISO 10646 is that its conception of a character's identity implicitly contains that character's properties - and those are standardized as well and alongside of just name and serial number. Many of these properties have associated with them algorithms, e.g. the bidi algorithm, that are an essential element of data interchange: if you don't know which order in the backing store is expected by the recipient to produce a certain display order, you cannot correctly prepare your data. There is one area where standardization in ISO relates to work in Unicode that I can think of, and that is sorting. However, sorting, beyond the underlying framework, ultimately relates to languages, and language-specific data is now housed in CLDR. Early attempts by ISO to standardize a similar framework for locale data failed, in part because the framework alone isn't the interesting challenge for a repository, instead it is the collection, vetting and management of the data. The reality is that the ISO model and its organizational structures are not well suited to the needs of many important area where some form of standardization is needed. That's why we have organization like IETF, W3C, Unicode etc.. Duplicating all or even part of their effort inside ISO really serves nobody's purpose. A./