Re: The Unicode Standard and ISO [localizable sentences]

2018-06-12 Thread Sarasvati via Unicode
The topic of localizable sentences is now closed on this mail list.
Please take that topic elsewhere.
Thank you.


On 6/12/2018 10:49 AM, Mark Davis ☕️ via Unicode wrote:

> That is often a viable approach. But proponents shouldn't get the wrong
impression. I think the chance of anything resembling the "localized
sentences" / "international message components" have  zero chance of being
adopted by Unicode (including the encoding, CLDR, anything). It is a waste
of many people's time discussing it further on this list.

> Why? As discussed many times on this list, it would take a major effort,
is not scoped properly (the translation of messages depends highly on
context, including specific products), and would not meet the needs of
practically anyone.

> People interested in this topic should  
> (a) start up their own project somewhere else, 
> (b) take discussion of it off this list, 
> (c) never bring it up again on this list.



Re: The Unicode Standard and ISO

2018-06-12 Thread Steven R. Loomis via Unicode
On Mon, Jun 11, 2018 at 8:32 AM, William_J_G Overington <
wjgo_10...@btinternet.com> wrote:

> Steven R. Loomis wrote:
>
> >Marcel,
> > The idea is not necessarily without merit. However, CLDR does not
> usually expand scope just because of a suggestion.
>  I usually recommend creating a new project first - gathering data,
> looking at and talking to projects to ascertain the usefulness of common
> messages.. one of the barriers to adding new content for CLDR is not just
> the design, but collecting initial data. When emoji or sub-territory names
> were added, many languages were included before it was added to CLDR.
>
> Well, maybe usually, but perhaps not this time?


Especially this time.
To Mark's later point: Start a separate project. Don't assume it will ever
merge with CLDR. If it succeeds, great.


Re: The Unicode Standard and ISO

2018-06-12 Thread Mark Davis ☕️ via Unicode
Steven wrote:

>  I usually recommend creating a new project first...

That is often a viable approach. But proponents shouldn't get the wrong
impression. I think the chance of anything resembling the "localized
sentences" / "international message components" have  zero chance of being
adopted by Unicode (including the encoding, CLDR, anything). It is a waste
of many people's time discussing it further on this list.

Why? As discussed many times on this list, it would take a major effort, is
not scoped properly (the translation of messages depends highly on context,
including specific products), and would not meet the needs of practically
anyone.

People interested in this topic should
(a) start up their own project somewhere else,
(b) take discussion of it off this list,
(c) never bring it up again on this list.


Mark

On Tue, Jun 12, 2018 at 4:53 PM, Marcel Schneider via Unicode <
unicode@unicode.org> wrote:

>
> William,
>
> On 12/06/18 12:26, William_J_G Overington wrote:
> >
> > Hi Marcel
> >
> > > I don’t fully disagree with Asmus, as I suggested to make available
> localizable (and effectively localized) libraries of message components,
> rather than of entire messages.
> >
> > Could you possibly give some examples of the message components to which
> you refer please?
> >
>
> Likewise I’d be interested in asking Jonathan Rosenne for an example or
> two of automated translation from English to bidi languages with data
> embedded,
> as on Mon, 11 Jun 2018 15:42:38 +, Jonathan Rosenne via Unicode wrote:
> […]
> > > > One has to see it to believe what happens to messages translated
> mechanically from English to bidi languages when data is embedded in the
> text.
>
> But both would require launching a new thread.
>
> Thinking hard enough, I’m even afraid that most subscribers wouldn’t be
> interested, so we’d have to move off-list.
>
> One alternative I can think of is to use one of the CLDR mailing lists. I
> subscribed to CLDR-users when I was directed to move there some technical
> discussion
> about keyboard layouts from Unicode Public.
>
> But now as international message components are not yet a part of CLDR,
> we’d need to ask for extra permission to do so.
>
> An additional drawback of launching a technical discussion right now is
> that significant parts of CLDR data are not yet correctly localized so
> there is another
> bunch of priorities under July 11 deadline. I guess that vendors wouldn’t
> be glad to see us gathering data for new structures while level=Modern
> isn’t complete.
>
> In the meantime, you are welcome to contribute and to motivate missing
> people to do the same.
>
> Best regards,
>
> Marcel
>
>


Re: The Unicode Standard and ISO

2018-06-12 Thread Steven R. Loomis via Unicode
> ISO 15924 is and ISO standard. Aspects of its content may be mirrored in
other places, but “moving its content” to CLDR makes no sense.

Fully agreed.

For what it's worth, I reopened a bug of Roozbeh's
https://unicode.org/cldr/trac/ticket/827?#comment:9 to make sure the ISO
15924 French content gets properly mirrored into CLDR, it looks like there
is a French-specific bug there, which may be what you are seeing, Marcel.


On Tue, Jun 12, 2018 at 8:57 AM, Michael Everson via Unicode <
unicode@unicode.org> wrote:

> All right, if you want a clear explanation.
>
> Yes, I think the ISO 8859-4 character names for the Latvian letters were
> mistaken. Yes, I think that mapping them to decompositions with CEDILLA
> rather than COMMA BELOW was a mistake. Evidently some felt that the
> normative mapping was important. This does not mean that SC2 “failed to do
> its part” and it did not cause a lack of desire for cooperation, and it
> bloody well did not “damage the reputation of the whole ISO/IEC”.
>
> As to ISO 15924, it was developed bilingually, and there was consensus on
> the names that are there. Last year you suggested a massive number of name
> changes to the French translation of ISO/IEC 10646, and I criticized you
> for foregoing stability for your own preferences. When it came to the names
> in 15924, I told you that I do not trust your judgement, and that I would
> consider revisions to the French names when you came back with consensus on
> those changes with experts Alain LaBonté, Patrick Andries, Denis Jacquerye,
> and Marc Lodewijck. As I have not heard from them, I conclude that no such
> consensus exists.
>
> ISO 15924 is and ISO standard. Aspects of its content may be mirrored in
> other places, but “moving its content” to CLDR makes no sense.
>
> Michael Everson
>
> > On 12 Jun 2018, at 16:20, Marcel Schneider via Unicode <
> unicode@unicode.org> wrote:
> > On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote:
> >>
> >> Marcel,
> >> You have put words into my mouth. Please don’t. Your description of
> what I said is NOT accurate.
> >>
> >>> On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  wrote:
> >>> And in this thread I wanted to demonstrate that by focusing on the
> wrong priorities, i.e. legacy character names instead of the practicability
> of on-going encoding and the accurateness of specified decompositions—so
> that in some instances cedilla was used instead of comma below, Michael
> pointed out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its
> mission—and thus didn’t inspire a desire of extensive cooperation (and
> damaged the reputation of the whole ISO/IEC).
> >
> > Michael, I’d better quote your actual e-mail:
> >
> > On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote:
> > […]
> >> Many things have more than one name. The only truly bad misnomers from
> that period was related to a mapping error,
> >> namely, in the treatment of Latvian characters which are called CEDILLA
> rather than COMMA BELOW.
> >
> > Now I fail to understand why this mustn’t be reworded to “the
> accurateness of specified decompositions—so that in some instances cedilla
> was used instead of comma below[.]” If any correction can be made, I’d be
> eager to take note. Thanks for correcting.
> >
> > Now let’s append the e-mail that I was about to send:
> >
> > Another ISO Standard that needs to be mentioned in this thread is ISO
> 15924 (script codes; not ISO/IEC). It has a particular status in that
> Unicode is the Registration Authority.
> >
> > I wonder whether people agree that it has a French version. Actually it
> does have a French version, but Michael Everson (Registrar) revealed on
> this List multiple issues with synching French script names in ISO 15924-fr
> and in Code Charts translations.
> >
> > Shouldn’t this content be moved to CLDR? At least with respect to
> localized script names.
>
>
>


Re: The Unicode Standard and ISO

2018-06-12 Thread Asmus Freytag via Unicode

  
  
On 6/12/2018 7:58 AM, Michael Everson
  via Unicode wrote:


  Marcel,

You have put words into my mouth. Please don’t. Your description of what I said is NOT accurate. 


  
On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  wrote:

And in this thread I wanted to demonstrate that by focusing on the wrong priorities, i.e. legacy character names instead of the practicability of on-going encoding and the accurateness of specified decompositions—so that in some instances cedilla was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and thus didn’t inspire a desire of extensive cooperation (and damaged the reputation of the whole ISO/IEC).

  
  




The final conclusion isn't backed by the
evidence. 
  
This kind of fault-finding needs to stop -
it's unproductive.
A./

  



Re: The Unicode Standard and ISO

2018-06-12 Thread Steven R. Loomis via Unicode
CLDR already has localized script names. The English is taken from ISO
15924. https://cldr-ref.unicode.org/cldr-apps/v#/fr/Scripts/

On Tue, Jun 12, 2018 at 8:20 AM, Marcel Schneider via Unicode <
unicode@unicode.org> wrote:

> On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote:
> >
> > Marcel,
> >
> > You have put words into my mouth. Please don’t. Your description of what
> I said is NOT accurate.
> >
> > > On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  wrote:
> > >
> > > And in this thread I wanted to demonstrate that by focusing on the
> wrong priorities, i.e. legacy character names instead of
> > > the practicability of on-going encoding and the accurateness of
> specified decompositions—so that in some instances cedilla
> > > was used instead of comma below, Michael pointed out—, ISO/IEC JTC1
> SC2/WG2 failed to do its part and missed its mission—
> > > and thus didn’t inspire a desire of extensive cooperation (and damaged
> the reputation of the whole ISO/IEC).
>
> Michael, I’d better quote your actual e-mail:
>
> On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote:
> […]
> > Many things have more than one name. The only truly bad misnomers from
> that period was related to a mapping error,
> > namely, in the treatment of Latvian characters which are called CEDILLA
> rather than COMMA BELOW.
>
> Now I fail to understand why this mustn’t be reworded to “the accurateness
> of specified decompositions—so that in some instances
> cedilla was used instead of comma below[.]”
> If any correction can be made, I’d be eager to take note.
> Thanks for correcting.
>
> Now let’s append the e-mail that I was about to send:
>
> Another ISO Standard that needs to be mentioned in this thread is ISO
> 15924 (script codes; not ISO/IEC).
> It has a particular status in that Unicode is the Registration Authority.
>
> I wonder whether people agree that it has a French version. Actually it
> does have a French version, but
> Michael Everson (Registrar) revealed on this List multiple issues with
> synching French script names in
> ISO 15924-fr and in Code Charts translations.
>
> Shouldn’t this content be moved to CLDR? At least with respect to
> localized script names.
>


Re: The Unicode Standard and ISO

2018-06-12 Thread Michael Everson via Unicode
All right, if you want a clear explanation.

Yes, I think the ISO 8859-4 character names for the Latvian letters were 
mistaken. Yes, I think that mapping them to decompositions with CEDILLA rather 
than COMMA BELOW was a mistake. Evidently some felt that the normative mapping 
was important. This does not mean that SC2 “failed to do its part” and it did 
not cause a lack of desire for cooperation, and it bloody well did not “damage 
the reputation of the whole ISO/IEC”. 

As to ISO 15924, it was developed bilingually, and there was consensus on the 
names that are there. Last year you suggested a massive number of name changes 
to the French translation of ISO/IEC 10646, and I criticized you for foregoing 
stability for your own preferences. When it came to the names in 15924, I told 
you that I do not trust your judgement, and that I would consider revisions to 
the French names when you came back with consensus on those changes with 
experts Alain LaBonté, Patrick Andries, Denis Jacquerye, and Marc Lodewijck. As 
I have not heard from them, I conclude that no such consensus exists. 

ISO 15924 is and ISO standard. Aspects of its content may be mirrored in other 
places, but “moving its content” to CLDR makes no sense. 

Michael Everson

> On 12 Jun 2018, at 16:20, Marcel Schneider via Unicode  
> wrote:
> On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote:
>> 
>> Marcel,
>> You have put words into my mouth. Please don’t. Your description of what I 
>> said is NOT accurate. 
>> 
>>> On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  wrote:
>>> And in this thread I wanted to demonstrate that by focusing on the wrong 
>>> priorities, i.e. legacy character names instead of the practicability of 
>>> on-going encoding and the accurateness of specified decompositions—so that 
>>> in some instances cedilla was used instead of comma below, Michael pointed 
>>> out—, ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and 
>>> thus didn’t inspire a desire of extensive cooperation (and damaged the 
>>> reputation of the whole ISO/IEC).
> 
> Michael, I’d better quote your actual e-mail:
> 
> On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote:
> […]
>> Many things have more than one name. The only truly bad misnomers from that 
>> period was related to a mapping error,
>> namely, in the treatment of Latvian characters which are called CEDILLA 
>> rather than COMMA BELOW. 
> 
> Now I fail to understand why this mustn’t be reworded to “the accurateness of 
> specified decompositions—so that in some instances cedilla was used instead 
> of comma below[.]” If any correction can be made, I’d be eager to take note. 
> Thanks for correcting.
> 
> Now let’s append the e-mail that I was about to send:
> 
> Another ISO Standard that needs to be mentioned in this thread is ISO 15924 
> (script codes; not ISO/IEC). It has a particular status in that Unicode is 
> the Registration Authority. 
> 
> I wonder whether people agree that it has a French version. Actually it does 
> have a French version, but Michael Everson (Registrar) revealed on this List 
> multiple issues with synching French script names in ISO 15924-fr and in Code 
> Charts translations.
> 
> Shouldn’t this content be moved to CLDR? At least with respect to localized 
> script names.





Re: The Unicode Standard and ISO

2018-06-12 Thread Marcel Schneider via Unicode
On Tue, 12 Jun 2018 15:58:09 +0100, Michael Everson via Unicode wrote:
> 
> Marcel,
> 
> You have put words into my mouth. Please don’t. Your description of what I 
> said is NOT accurate. 
> 
> > On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  wrote:
> > 
> > And in this thread I wanted to demonstrate that by focusing on the wrong 
> > priorities, i.e. legacy character names instead of
> > the practicability of on-going encoding and the accurateness of specified 
> > decompositions—so that in some instances cedilla
> > was used instead of comma below, Michael pointed out—, ISO/IEC JTC1 SC2/WG2 
> > failed to do its part and missed its mission—
> > and thus didn’t inspire a desire of extensive cooperation (and damaged the 
> > reputation of the whole ISO/IEC).

Michael, I’d better quote your actual e-mail:

On Fri, 8 Jun 2018 13:01:48 +0100, Michael Everson via Unicode wrote:
[…]
> Many things have more than one name. The only truly bad misnomers from that 
> period was related to a mapping error,
> namely, in the treatment of Latvian characters which are called CEDILLA 
> rather than COMMA BELOW. 

Now I fail to understand why this mustn’t be reworded to “the accurateness of 
specified decompositions—so that in some instances 
cedilla was used instead of comma below[.]”
If any correction can be made, I’d be eager to take note.
Thanks for correcting.

Now let’s append the e-mail that I was about to send:

Another ISO Standard that needs to be mentioned in this thread is ISO 15924 
(script codes; not ISO/IEC).
It has a particular status in that Unicode is the Registration Authority. 

I wonder whether people agree that it has a French version. Actually it does 
have a French version, but 
Michael Everson (Registrar) revealed on this List multiple issues with synching 
French script names in 
ISO 15924-fr and in Code Charts translations.

Shouldn’t this content be moved to CLDR? At least with respect to localized 
script names.



Re: The Unicode Standard and ISO

2018-06-12 Thread Michael Everson via Unicode
Marcel,

You have put words into my mouth. Please don’t. Your description of what I said 
is NOT accurate. 

> On 12 Jun 2018, at 03:53, Marcel Schneider via Unicode  
> wrote:
> 
> And in this thread I wanted to demonstrate that by focusing on the wrong 
> priorities, i.e. legacy character names instead of the practicability of 
> on-going encoding and the accurateness of specified decompositions—so that in 
> some instances cedilla was used instead of comma below, Michael pointed out—, 
> ISO/IEC JTC1 SC2/WG2 failed to do its part and missed its mission—and thus 
> didn’t inspire a desire of extensive cooperation (and damaged the reputation 
> of the whole ISO/IEC).




Re: The Unicode Standard and ISO

2018-06-12 Thread Marcel Schneider via Unicode


William,

On 12/06/18 12:26, William_J_G Overington wrote:
> 
> Hi Marcel
> 
> > I don’t fully disagree with Asmus, as I suggested to make available 
> > localizable (and effectively localized) libraries of message components, 
> > rather than of entire messages.
> 
> Could you possibly give some examples of the message components to which you 
> refer please?
> 

Likewise I’d be interested in asking Jonathan Rosenne for an example or two of 
automated translation from English to bidi languages with data embedded, 
as on Mon, 11 Jun 2018 15:42:38 +, Jonathan Rosenne via Unicode wrote:
[…]
> > > One has to see it to believe what happens to messages translated 
> > > mechanically from English to bidi languages when data is embedded in the 
> > > text. 

But both would require launching a new thread. 

Thinking hard enough, I’m even afraid that most subscribers wouldn’t be 
interested, so we’d have to move off-list. 

One alternative I can think of is to use one of the CLDR mailing lists. I 
subscribed to CLDR-users when I was directed to move there some technical 
discussion 
about keyboard layouts from Unicode Public.

But now as international message components are not yet a part of CLDR, we’d 
need to ask for extra permission to do so.

An additional drawback of launching a technical discussion right now is that 
significant parts of CLDR data are not yet correctly localized so there is 
another
bunch of priorities under July 11 deadline. I guess that vendors wouldn’t be 
glad to see us gathering data for new structures while level=Modern isn’t 
complete.

In the meantime, you are welcome to contribute and to motivate missing people 
to do the same.

Best regards,

Marcel



Re: The Unicode Standard and ISO

2018-06-12 Thread William_J_G Overington via Unicode
Hi Marcel

> I don’t fully disagree with Asmus, as I suggested to make available 
> localizable (and effectively localized) libraries of message components, 
> rather than of entire messages.

Could you possibly give some examples of the message components to which you 
refer please?

Asmus wrote:

> A middle ground is a shared terminology database that allows translators 
> working on different products to arrive at the same translation for the same 
> things. Translators already know how to use such databases in their work 
> flow, and integrating a shared one with a product-specific one is much easier 
> than trying to deal with a set of random error messages.

I am not a linguist. I am interested in languages but my knowledge of languages 
is little more than that of general education, though I have written a song in 
French.

http://www.users.globalnet.co.uk/~ngo/une_chanson.pdf

So when Asmus wrote "Translators already know how to use such databases in 
their work flow, ", I do not know how to do that myself.

> The challenge as I see it is to get them translated to all locales.

Well, yes, that is a big challenge.

It depends whether people want to get it done.

In England, with its changeable weather, part of the culture is to talk about 
the weather. For example, at a bus stop talking about the weather with other 
people: it is sociable without being intrusive or controversial. Alas it did 
not occur to me that that might seem strange to some people who are not from 
England.

http://www.english-at-home.com/speaking/talking-about-the-weather/

http://www.bbc.com/future/story/20151214-why-do-brits-talk-about-the-weather-so-much

I remember when I wrote about localizable sentences in this mailing list in 
mid-April 2009, using sentences about the weather, I hoped, in hindsight rather 
naively, that people on the mailing list would be interested and that 
translations into many languages would be posted and then things would get 
going.

In the event, only one person, Magnus Bodin, provided translations. Magnus 
provided translations into Swedish and also provided a translation for an 
additional sentence as well. I knew no Swedish myself. These translations have 
been extremely helpful in my research project as they demonstrate communication 
through the language barrier using encoded localizable sentences.

Yesterday I provided three example error message sentences.

https://www.unicode.org/mail-arch/unicode-ml/y2018-m06/0088.html

Please consider one of them, which could be output as a code number, say, 
::4842357:; from an application program if someone enters a letter of the 
alphabet into a curency field, and then displayed localized into a language by 
first decoding using a sentence.dat UTF-16 text file for that language that 
includes a line that starts ::4842357:;| and then has the localization into 
that particular language, the language being any language that can be displayed 
using Unicode.

For English, the line in the sentence.dat file would be as follows.

::4842357:;|Data entry for the currency field must be either a whole positive 
number or a positive number to exactly two decimal places.

It would be great if some bilingual readers of this mailing list were to post a 
translation of the above line of text into another language.

In my research I am using an integral sign as a base character and circled 
digit characters.

If possible, a character such as U+FFF7 could be encoded to be the base 
character as that would provide a unique unambiguous link to star space from 
Unicode plain text. However whether that happens at some future time will 
depend upon there being sufficient interest at that future time in using 
localizable sentences for communication through the language barrier.

William Overington

Tuesday 12 June 2018