subject:"Re\: Unicode Emoji 5.0 characters now final"

Re: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final))

2017-04-04 Thread William_J_G Overington

Philippe Verdy wrote:

> What you are describing is reinventing the wheel, notably basically what SVG 
> paths already define.

Well, I am trying to express, within a tag sequence that could be included in 
an interoperable Unicode plain text message, the glyph information for one 
emoji glyph of an OpenType colour font.

I have not included anything about SVG.

> Font encoding technologies define their own system using multiple tables and 
> a compact dictionnary of tables with binary encoding, not suitable for 
> inclusion in plain-text.

Yes, that is why I have devised this format, so that the glyph information for 
one emoji glyph of an OpenType colour font could be included in a Unicode plain 
text message.

> Note also that Emojis could be animated when rendered on screen (that's what 
> we already see in many implementations using GIF icons for their emojis, even 
> if they are not easily resizable). Animated SVG for now is still in beta but 
> starts being used on some sites and rendered by web browsers. SVG images may 
> also be scripted and may include accessbility feature (e.g. with sound played 
> or hint bubbles displayed when hovering them).

The format that I suggested could be extended if desired.

For example, h is for an unanimated glyph.

There could be added q and e if desired, so that instead of h one uses q for 
completing the glyph for each frame, and then e to export the complete animated 
glyph.

For example, as follows.

q means {define a complete glyph of advance width w from the glyph or glyphs in 
the glyphs buffer and place it in the animation buffer; reset everything except 
the animation buffer ready to define the next glyph in the animation;}

e means {produce an animated glyph from the contents of the animation buffer 
ready for access by the main program; halt;}

Yes, accessibility features are important and I will try to think about 
including them. Readers are welcome to make suggestions as to what is needed.

> You only cover a part of what is needed 

Well, yes, I suppose so, yet what I have published could get something started 
and anything else that is needed could be added, either by me or by the Unicode 
Technical Committee and the Emoji Subcommittee if people are interested in 
implementing the idea.

>  but hope that someone will invest time to implet it in a renderer:

Well, yes eventually.

I am hoping that the idea will be discussed in the mailing list and then go 
forward to the Emoji Subcommittee and then go to the Unicode Technical 
Committee and then become part of The Unicode Standard and then be used by 
people.

Many people think of new encoding ideas and put them forward to the Unicode 
Technical Committee, sometimes starting with a post in this mailing list before 
a formal submission in the hope that the discussion will be helpful. Such 
discussion often improves the formal submission. That is the process, the way 
that Unicode progresses.

> ... developers prefer investing time in SVG renderers or existing font 
> technologies for OpenType (SVG fonts will come later when it will be capable 
> of doing the same things as OpenType, for now it does not cover all the 
> existing needs).

Well, I do not know what developers prefer. There seems to be a need to send 
custom emoji in interoperable Unicode plain text and I have put forward an idea 
for how to do it.

William Overington

Tuesday 4 April 2017

Re: Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final))

2017-04-04 Thread Philippe Verdy

2017-04-04 12:18 GMT+02:00 William_J_G Overington :

> > ... developers prefer investing time in SVG renderers or existing font
> technologies for OpenType (SVG fonts will come later when it will be
> capable of doing the same things as OpenType, for now it does not cover all
> the existing needs).
>
> Well, I do not know what developers prefer. There seems to be a need to
> send custom emoji in interoperable Unicode plain text and I have put
> forward an idea for how to do it.
>

You just know what you isolately prefer: can't you see that what you
propose is even less powerfull than a **STANDARD** SVG path ? it already
has eveything you 'propose", except that it is already widely implemented
and developers will prefer reuing them directly.

A SVG path looks like "M100,100h800v800h-800z" to draw a square 800-sized
centered in a 1000-sized square, there's no need for "x" or "y", there are
shortcuts already defined for horizontal or vertical strokes (using
relative or absolute coordinates) and path closure, and it supports
straight segments, cubic and quadratic splines and elliptic arcs. Its
internal "machine" is very well documented (with extensive conformance
tests for renderers, including for all supported geometric transforms and
conversion of paths for creating stroke styles instead of filling them
directly).

Tags and custom vector glyph emoji (from Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final))

2017-04-03 Thread William_J_G Overington

Peter Constable wrote:

> William, you completely miss the point: As long as Unicode is the way to 
> provide emoji to consumers, their needs and desires will not be best or fully 
> met. Unicode as an AND gate is too many AND gates.

Ah, I understand what you mean now.

In my feedback of 7 March 2017 to PRI #348 on the Length of Tag Sequences I 
included the following.

quote

 for example, a vector glyph in a platform-independent colour-font-style 
contour format could be expressed using tag characters.

end quote

Following your post and my now understanding your meaning I have written some 
notes about the above possibility.

Previously I have made some colour fonts using the High-Logic FontCreator 
program.

I do not claim to be expert on the OpenType colour font format, yet I know 
about the idea of having several glyphs with each such glyph being of one 
colour and then combining them to produce a colourful glyph and I also know 
about the option to include a default monochrome glyph.

I enjoy trying to devise encoding systems, so I have tried to produce a way to 
send the information for a colourful glyph within a tag sequence.

I am thinking that a future email or text message reception system could decode 
the tag sequence and add a colourful glyph to the font being used to display 
the message.

This method, if people can get it to work satisfactorily, would allow custom 
vector glyph emoji within an interoperable plain text system.

Here is a transcript of what I have produced so far. Readers of this thread are 
invited to have a look at the idea and are welcome to try to implement it if 
they so choose. If any additions are needed, or indeed if any changes are 
needed, please say. There needs to be a way so that the tag sequence for the 
glyph for a particular character is only sent once in a message even though the 
character may be used more than once in the message.

Tags and custom vector glyph emoji

Some notes as at Monday 3 April 2017 19:04 pm British Summer Time

A tag sequence for this purpose starts with a capital letter V standing for 
vector format.

At the start of the sequence a:=255; b:=0; g:=0; m:=1; p:=0; r:=0; x:=0; y:=0; 
w:=1000;

At the start of the sequence the points buffer is empty, the contours buffer is 
empty and the glyphs buffer is empty.

The system uses a special-purpose virtual computing engine within a software 
sandbox. The special-purpose virtual computing engine has no commands for loops 
and is a single pass interpretative system. 

Letters that are each used both as a command and also as the name of a register 
in the special-purpose virtual computing engine.

a means {a:=p; p:=0; m:=0;}

b means {b:=p; p:=0; m:=0;}

g means {g:=p; p:=0; m:=0;}

m means {m:=1;}

p means {p:=0;}

r means {r:=p; p:=0; m:=0;}

x means {x:=p; p:=0;}

y means {y:=p; p:=0;}

w means {w:=p; p:=0;}

Letters that are used as a command but not as the name of a register in the 
special-purpose virtual computing engine.

c means {define a closed contour from the points in the points buffer; clear 
the points buffer ready for the next point; x:=0; y:=0; p:=0;}

d means {define a glyph from the contour or contours in the contours buffer, if 
m=1 then the the glyph is the first glyph and is the monochrome glyph, else the 
glyph is of colour (r, g, b, a) and is not the first glyph; clear the contours 
buffer ready for the next glyph;clear the points buffer ready for the next 
point; a:=255; b:=0; g:=0; r:=0; x:=0; y:=0; p:=0; m:=0;}
The use of the m register is so that a default monochrome glyph may optionally 
be included as the first glyph defined. If any component of the colour or 
opacity is defined before a d command is used, then the monochrome component is 
left empty.

f means {define an off curve point using x and y; x:=0; y:=0; p:=0;}

h means {define a complete glyph of advance width w from the glyph or glyphs in 
the glyphs buffer and have it ready for access by the main program; halt;}

n means {define an on curve point using x and y; x:=0; y:=0; p:=0;}

Digits

Digits 0 .. 9 each mean p:=10*p + (digit);

The system is designed to be notionally for an emoji glyph within a virtual 
space of (x from 0 .. 1000 and y from 0 .. 1000). These values may be scaled to 
fit with the metrics of a real world font with which a glyph communicated using 
this system is applied.

A tag sequence for this purpose ends with a cancel tag.

Some basic examples of parts of a tag sequence to provide an idea of how the 
system would be used.

The following part of a tag sequence would set the x register to have the value 
250.

250x

The following part of a tag sequence would define an on-curve point at (x,y) = 
(250, 900)

250x900yn

The following part of a tag sequence would define a contour.

250x900yn800x500yf250x100ync

The following part of a tag sequence would define a colour glyph that has one 
contour.

250x900yn800x500yf250x100ync255b128gd

Re: Unicode Emoji 5.0 characters now final

2017-03-31 Thread Asmus Freytag


  
  
On 3/31/2017 3:38 PM, Doug Ewell wrote:


  What's wrong with "other" or "additional" in contrast to "recommended"
or "preferred"? Or is the intent really to say "don't use these"?

People coming from the IETF background (that
is, anyone familiar with how RFCs are phrased) will read "recommended"
not as "optional" but as "required unless there are solid
countervailing reasons" (my paraphrase).
I think the problem is that "recommended" is
really meant that way (vendors are very strongly encouraged not
to select arbitrary subsets). But the inverse, that is "not recommended"
is not meant that way. It is meant more in the line of a "buyer
beware for users to not get their hopes up that all potential
tag sequences will be interchangeable.
There's an inherent impossibility here, if
one tries to get this across in one sentence. It requires two:
one for vendors and one for users.
And Doug may be on to something when he
suspects that these might become paragraphs.
A./
PS: I like the suggestion that, if the recommended minimal subset
  really reflects vendor agreement on interoperability, that this be
  mentioned explicitly.

RE: Unicode Emoji 5.0 characters now final

2017-03-31 Thread Doug Ewell

Peter Constable wrote:

> Would "are not very likely to be well-supported in common platforms or
> applications" work?

No, I think it should be even longer, maybe a paragraph or two, because
the concept of "A-list" versus "everything else" is just too complex and
unfamiliar to express concisely.

What's wrong with "other" or "additional" in contrast to "recommended"
or "preferred"? Or is the intent really to say "don't use these"?

--
Doug Ewell | Thornton, CO, US | ewellic.org

RE: Unicode Emoji 5.0 characters now final

2017-03-31 Thread Doug Ewell

Mark Davis wrote:

> Ken's observation "…approximately backwards…" is exactly right, and
> that's the same reason why Markus suggested something along the lines
> of "interoperable".

If the list was arrived at by members of the Consortium who are vendors
responsible for implementing (or not) emoji flags, then it would be good
to state this fact rather clearly and visibly. Otherwise it really does
look like UTC doing the recommending, and the recommending-against.

> I don't think we've come up with a pithy category name yet, but I
> tried different wording on the slides on http://unicode.org/emoji/.
> See what you think, Doug.

Slide 37 (speaker's notes) says:

"While at this point only three flags are on the recommended list,
implementations can provide other subdivision flags."

That's not a problem, except for being buried in speaker's notes. It
implies that all valid sequences are fine but some might not be
universally supported. That's normal for Unicode.

Slide 38 (slide and speaker's notes) says:

"Valid (but not recommended for vendors)"

Nope. That brings it right back to "Hey, vendors, Unicode recommends
that you don't support these." As I said Thursday, if that is the
intent, then don't change the wording; it's perfect as is.

The wordsmithing -- if that's all it is and not truly a warning-against
-- needs to apply primarily to the "not recommended" category. I
suggested "additional" to remove the explicit negative of "not
recommended" and "Standard? - No." In today's tread-lightly speech, "not
recommended" has the strong sense of "recommended against." Eating
poison ivy is Not Recommended.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread William_J_G Overington

Peter Constable wrote:

> The interest of consumers, in regard to emoji, will never be best met by 
> Unicode-encoded emoji, no matter what process there is for determining what 
> should be "recommended", because consumers inevitably want emoji they 
> recommend for themselves, not what anybody else recommends.

The consumers can only choose from what is available to consumers. So what the 
Unicode Technical Committee recommends or "not-recommends" may well have a very 
significant effect upon the choices available to the consumer.

> If Sally wants an emoji to convey her thoughts on her grandson's school play, 
> or on the latest tweet from a politician, or whatever, she wants it _now_, 
> and she doesn't particularly care if you or I would recommend that emoji to 
> her or not.

Sally may not know that the Unicode Technical Committee exists. Sally may have 
bought her computer or mobile telephone and just uses it, choosing from the 
emoji available in a menu system, perhaps never realizing all of the detailed 
standards work and implementation work that took place before the device was 
manufactured. It is not that Sally is having a particular emoji recommended to 
her as such, yet if the Unicode Technical Committee "not-recommends" 
implementation of some emoji that are in the standards document, then Sally may 
never get the opportunity to choose to use those emoji.

> So, before we go talking about whether _Unicode_ is accommodating the benefit 
> of consumers, I think should be asking whether _all the popular 
> communications protocols_ are accommodating the benefit of consumers.

Well, all of the various standards needed to produce useful products are 
important. It is not a matter of one being considered before the other. For a 
particular emoji to become available in a device that is available to a 
consumer there are various stages. They are like an AND gate where all inputs 
must be true in order for the result to be true.

The Unicode Technical Committee has enormous power and influence to affect the 
future of information technology.

It works both ways. Where an encoding is made there can be progress, yet where 
an idea is rejected then there is no way forward for an interoperable plain 
text encoding to become achieved.

I submitted a document in 2015. It was determined to be out of scope and was 
not included in the Document Register and the Unicode Technical Committee did 
not consider it.

I submitted a later version and received no reply about it at all.

So I cannot make progress over an interoperable plain text encoding becoming 
implemented at the present time. Quite a number of UTC meetings have taken 
place since.

Yet the scope of Unicode is a people-made rule, it could change if people with 
influence want it to change. The UTC could consider my document and hold a 
Public Review if it chose to do so.

So, the Unicode Technical Committee has enormous power and influence to affect 
the future of information technology.

When a "not-recommendation" of what to support takes place the decision to do 
that "not-recommending" can have significant and long-lasting effects on 
progress.

William Overington

Friday 31 March 2017

RE: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread Peter Constable

William, you completely miss the point: As long as Unicode is the way to 
provide emoji to consumers, their needs and desires will not be best or fully 
met. Unicode as an AND gate is too many AND gates.

Peter

Sent from my Windows 10 phone

From: William_J_G Overington<mailto:wjgo_10...@btinternet.com>
Sent: Friday, March 31, 2017 7:50 AM
To: Peter Constable<mailto:peter...@microsoft.com>; 
unicode@unicode.org<mailto:unicode@unicode.org>
Subject: Re: Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters 
now final)

Peter Constable wrote:

> The interest of consumers, in regard to emoji, will never be best met by 
> Unicode-encoded emoji, no matter what process there is for determining what 
> should be "recommended", because consumers inevitably want emoji they 
> recommend for themselves, not what anybody else recommends.

The consumers can only choose from what is available to consumers. So what the 
Unicode Technical Committee recommends or "not-recommends" may well have a very 
significant effect upon the choices available to the consumer.

> If Sally wants an emoji to convey her thoughts on her grandson's school play, 
> or on the latest tweet from a politician, or whatever, she wants it _now_, 
> and she doesn't particularly care if you or I would recommend that emoji to 
> her or not.

Sally may not know that the Unicode Technical Committee exists. Sally may have 
bought her computer or mobile telephone and just uses it, choosing from the 
emoji available in a menu system, perhaps never realizing all of the detailed 
standards work and implementation work that took place before the device was 
manufactured. It is not that Sally is having a particular emoji recommended to 
her as such, yet if the Unicode Technical Committee "not-recommends" 
implementation of some emoji that are in the standards document, then Sally may 
never get the opportunity to choose to use those emoji.

> So, before we go talking about whether _Unicode_ is accommodating the benefit 
> of consumers, I think should be asking whether _all the popular 
> communications protocols_ are accommodating the benefit of consumers.

Well, all of the various standards needed to produce useful products are 
important. It is not a matter of one being considered before the other. For a 
particular emoji to become available in a device that is available to a 
consumer there are various stages. They are like an AND gate where all inputs 
must be true in order for the result to be true.

The Unicode Technical Committee has enormous power and influence to affect the 
future of information technology.

It works both ways. Where an encoding is made there can be progress, yet where 
an idea is rejected then there is no way forward for an interoperable plain 
text encoding to become achieved.

I submitted a document in 2015. It was determined to be out of scope and was 
not included in the Document Register and the Unicode Technical Committee did 
not consider it.

I submitted a later version and received no reply about it at all.

So I cannot make progress over an interoperable plain text encoding becoming 
implemented at the present time. Quite a number of UTC meetings have taken 
place since.

Yet the scope of Unicode is a people-made rule, it could change if people with 
influence want it to change. The UTC could consider my document and hold a 
Public Review if it chose to do so.

So, the Unicode Technical Committee has enormous power and influence to affect 
the future of information technology.

When a "not-recommendation" of what to support takes place the decision to do 
that "not-recommending" can have significant and long-lasting effects on 
progress.

William Overington

Friday 31 March 2017

Re: [OT] Europe vs. European Union (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread Doug Ewell

Manuel Strehl wrote:

> Maybe I'm missing context, but what is the specific problem of those
> lists differing? 
>
> The EU and Europe _are_ two different things. The United States of
> America similarly do not include the whole of America, despite the
> name.

A previous offshoot of the flag thread had veered into discussion of the
UN code element for Europe, and the ISO exceptionally reserved code
element for the EU, and the lists of countries in each, and something
about WIPO and ITU and ccTLDs.

I was pointing out what you said, that the lists differ by nature and
comparing them is a fruitless exercise.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-31 Thread Mark Davis ☕️

Ken's observation "…approximately backwards…" is exactly right, and that's
the same reason why Markus suggested something along the lines of
"interoperable".

I don't think we've come up with a pithy category name yet, but I tried
different wording on the slides on http://unicode.org/emoji/. See what you
think, Doug.

Mark

Mark

On Thu, Mar 30, 2017 at 4:58 PM, Doug Ewell  wrote:

> Asmus Freytag wrote:
>
> > Recommending to vendors to support a minimal set is one thing.
> > Recommending to users to only use sequences from that set / or vendors
> > to not extend coverage beyond the minimum is something else. Both use
> > the word "recommendation" but the flavor is rather different (which
> > becomes more obvious when you re-phrase as I suggested).
> >
> > That seems to be the source of the disconnect.
>
> That seems a fair analysis.
>
> Another way of putting this is that marking a particular subset of valid
> sequences as "recommended" is one thing, while listing sequences in a
> table with a column "Standard sequence?", with some sequences marked
> "Yes" and others marked "No," is something else.
>
> Equivalently, characterizing a group of valid sequences as "Valid, but
> not recommended" is something else.
>
> If the goal is to tell users that three of the sequences are especially
> likely to be supported, or to tell vendors that they should prioritize
> support for these three, then "recommended" and "additional," used as a
> pair, would be more appropriate.
>
> If the goal is to tell users "we don't want you to use the other 5100
> sequences" and to tell vendors "we don't want you to offer support for
> them," then the existing wording is fine.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>

Re: [OT] Europe vs. European Union (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-31 Thread Manuel Strehl

Maybe I'm missing context, but what is the specific problem of those lists
differing?

The EU and Europe _are_ two different things. The United States of America
similarly do not include the whole of America, despite the name.

And Norway and Switzerland and some others (incl. soon England) might not
be too happy with either institution to make a forced move to unify those
lists.

–Manuel

2017-03-30 23:39 GMT+02:00 Doug Ewell :

> The UN "M49 Standard" (that's how they're styling it now; I guess we
> should stop writing "M.49") assigns a code element for each "country or
> area" and groups these into "geographical regions."
>
> To find the "countries or areas" included within code element 150 for
> "Europe," simply visit https://unstats.un.org/unsd/methodology/m49/ ,
> select Geographic Regions from the menu at the left, and expand the
> entries for Europe and its four subregions. The lists are available in
> six languages, including French.
>
> To find the countries that make up the European Union at any given
> moment, visit http://europa.eu/european-union/about-eu/countries_fr (or
> similar for other EU languages). As is well known, this list has changed
> in the past and will change in the future.
>
> The point is that UNSD's definition of Europe and the roster of the
> European Union are different lists, and no attempt is made by either
> organization to make these lists identical or to explain or justify
> differences.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread gfb hjjhjh

On the topic I am surprised to see the only large Chinese comoany in the
member list is Huawei, with none of large Chinese internet company,
including Baidu, Alibaba, Tencent, Sina, Netease participating in Unicode.
In associate member list there is a company named zhongyi but that link is
already 404ed...
>
>
> 2017年3月30日 15:51 於 "Mark Davis ☕️"  寫道：
>>
>>
>>> If I made an open-source emoji font that contained flags for all of the
5000ish
>>> ISO 3166-2 codes that actually map to one, would I automatically be
considered a
>>> vendor?
>>
>>
>>> Do I need to have to pay 18000(?) dollars a year for full membership
>>> first? (That's peanuts for multi-billion dollar companies, but
unaffordable for
>>> most individuals and many FOSS projects.)
>>
>>
>> The answer to both of your questions is no.
>>
>> Please see http://unicode.org/emoji/selection.html#timeline for details.
What the UTC is looking for is commitments from major vendors. It is not
sufficient to join Unicode: we have members who are not major vendors of
emoji. And there are some major vendors that are not members.
>>
>> Of course, there is some judgment involved as to what constitutes
"major": at one extreme clearly 1B DAUs qualifies, and at the other
extreme, 1K doesn't.
>>
>> Mark

[OT] Europe vs. European Union (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-30 Thread Doug Ewell

The UN "M49 Standard" (that's how they're styling it now; I guess we
should stop writing "M.49") assigns a code element for each "country or
area" and groups these into "geographical regions."

To find the "countries or areas" included within code element 150 for
"Europe," simply visit https://unstats.un.org/unsd/methodology/m49/ ,
select Geographic Regions from the menu at the left, and expand the
entries for Europe and its four subregions. The lists are available in
six languages, including French.

To find the countries that make up the European Union at any given
moment, visit http://europa.eu/european-union/about-eu/countries_fr (or
similar for other EU languages). As is well known, this list has changed
in the past and will change in the future.

The point is that UNSD's definition of Europe and the roster of the
European Union are different lists, and no attempt is made by either
organization to make these lists identical or to explain or justify
differences. 
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Philippe Verdy

2017-03-30 11:48 GMT+02:00 Christoph Päper :

> Philippe Verdy  hat am 30. März 2017 um 00:40
> geschrieben:
>
> > There's no ISO 3166-1 code for Europe at the whole (does it exist
> legally if
> > we can't clearly define its borders?)
>
> `150` in UN M.49 which ISO 3166-1 was derived from and is compatible with.
> CLDR
> could safely adopt that if needed.
>

I have not seen a clear statement that UN M.49 code 150 for Europe (as a
whole) was related to the EU assignment in ISO 3166-1 which refers to the
European Union (but in fact still refers legally to the European Community
the only part legally recognized, even the the European Union attempted to
unify the communities this unification was partial, and three separat
"pilars" were kept). I've clearly read that EU was assigned in ISO3166 only
because of its use in WIPO standards. There are some other assignments made
for keeping compatibility with ITU standards, or with the Postal Union.

Note the ITU also defines a "European broadcasting region" that covers
north Africa and come countries of the Middle East: it is the base of
existence of the EBU (Eurovision), the second base being also the Council
of Europe one or the other being a requirement for full membership. The ITU
definition is appropriate because it matchs with coverage areas by
satellites.

So I don't think there is any equivalence between code 150 and code EU
(which includes parts outside 150, for example some of the French and Dutch
overseas dependencies in America, and Africa).

After the "Brexit" we don't know if GB will still be part of EU for WIPO
standards.. But British domain names registered in the ".eu" ccTLD will
remain valid (the TLD is not bound to the same rules as WIPO standards). As
far as I have seen GB will keep its existing status in WIPO so it should
still be part the "EU" code in ISO3166-1, unless its own membership in WIPO
is amended (I have doubt it will ever happen, GB would loose some of their
existing IP right protection).

Tailoring the Marketplace (is: Re: Unicode Emoji 5.0 characters now final)

2017-03-30 Thread Marcel Schneider

On Thu, 30 Mar 2017 15:03:11 +0100 (BST), William_J_G Overington wrote:
> 
> > What the UTC is looking for is commitments from major vendors.
> 
> Well should it be applying such a filter on progress?
> 
> I opine that assessment should be on merit and that new ideas should be 
> considered on an even-handed basis. Progress should not be on the basis of 
> what major vendors choose to do. Requiring commitments from major vendors 
> could be a barrier to new enterprises developing and a barrier to progress 
> for the benefit of consumers being made.

Thatʼs exactly the point: that the marketplace should be tailored for the 
benefit of consumers, not for the sole benefit of vendors. Instead, the 
question seems always to be “who is paying for it?” Another example has been 
recently discussed: the use of superscript letters is “discouraged”, seemingly 
to prevent a set of consumers from being able to write in an acceptable way a 
couple of languages in plain text, and to subjugate these customers to the use 
of a series of rich text software. The problem is not whether to use high-end 
software or not, but the way how users get their stuff messed up if they donʼt.

When it was up to encode the first set of superscript Latin letters in 
Unicode 1.0 — or were they *too* enforced by Bruce Paterson of ISO/IEC 10646? — 
all straightforward people surely were going to follow the pattern of:

2071 SUPERSCRIPT LATIN SMALL LETTER I
* functions as a modifier letter
#  0069
207F SUPERSCRIPT LATIN SMALL LETTER N
* functions as a modifier letter
#  006E
@ Latin subscript modifier letters
1D62 LATIN SUBSCRIPT SMALL LETTER I
#  0069
1D63 LATIN SUBSCRIPT SMALL LETTER R
#  0072
1D64 LATIN SUBSCRIPT SMALL LETTER U
#  0075
1D65 LATIN SUBSCRIPT SMALL LETTER V
#  0076

and name them accordingly. But given the way of finally calling them:

@@ 02B0 Spacing Modifier Letters 02FF
@ Latin superscript modifier letters
x (superscript latin small letter i - 2071)
x (superscript latin small letter n - 207F)
02B0 MODIFIER LETTER SMALL H
* aspiration
#  0068
02B1 MODIFIER LETTER SMALL H WITH HOOK

and so on, somebody must have arisen telling “Wait! if we label them as what 
they 
are, folks will use these instead of our software, so letʼs disguise them a 
bit!” 
As a result, weʼve ended up with every script on earth being writeable in plain 
text except Latin. That seems to be an abuse of dominant position, to make an 
unknown amount of more bargain at the expense of a relatively narrow subset of 
disfavored end-users, as if the usefulness of vendorsʼ software would 
essentially 
depend on one single feature: superscript formatting.

Regards,
Marcel

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Doug Ewell

William_J_G Overington wrote:

>> Of course, there is some judgment involved as to what constitutes
>> "major": at one extreme clearly 1B DAUs qualifies, and at the other
>> extreme, 1K doesn't.
>
> What does 1B DAUs mean please?

>From http://acronyms.thefreedictionary.com/DAU I gathered that this
might be search-engine industry jargon for "1 billion daily active
users" as opposed to 1000 of them. 
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread William_J_G Overington

> What the UTC is looking for is commitments from major vendors.

Well should it be applying such a filter on progress?

I opine that assessment should be on merit and that new ideas should be 
considered on an even-handed basis. Progress should not be on the basis of what 
major vendors choose to do. Requiring commitments from major vendors could be a 
barrier to new enterprises developing and a barrier to progress for the benefit 
of consumers being made.

> Of course, there is some judgment involved as to what constitutes "major": at 
> one extreme clearly 1B DAUs qualifies, and at the other extreme, 1K doesn't.

What does 1B DAUs mean please?

William Overington

Thursday 30 March 2017

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Doug Ewell

Asmus Freytag wrote:

> Recommending to vendors to support a minimal set is one thing.
> Recommending to users to only use sequences from that set / or vendors
> to not extend coverage beyond the minimum is something else. Both use
> the word "recommendation" but the flavor is rather different (which
> becomes more obvious when you re-phrase as I suggested).
>
> That seems to be the source of the disconnect.

That seems a fair analysis.

Another way of putting this is that marking a particular subset of valid
sequences as "recommended" is one thing, while listing sequences in a
table with a column "Standard sequence?", with some sequences marked
"Yes" and others marked "No," is something else.

Equivalently, characterizing a group of valid sequences as "Valid, but
not recommended" is something else.

If the goal is to tell users that three of the sequences are especially
likely to be supported, or to tell vendors that they should prioritize
support for these three, then "recommended" and "additional," used as a
pair, would be more appropriate.

If the goal is to tell users "we don't want you to use the other 5100
sequences" and to tell vendors "we don't want you to offer support for
them," then the existing wording is fine.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Mark Davis ☕️

> `150` in UN M.49 which ISO 3166-1 was derived from and is compatible
with. CLDR
could safely adopt that if needed.

No need to "safely adopt". It is already valid:

http://www.unicode.org/reports/tr51/proposed.html#flag-emoji-tag-sequences

If you follow the links you'll end up at

http://unicode.org/repos/cldr/trunk/common/validity/region.xml

And find that 150 is already valid. (For the format of that file, see LDML.)

Where people have looked at the documentation and their questions are still
not answered, that feedback is useful so that the documentation can be
improved. But it appears that at least some people haven't bothered to do
that, when it could answer a lot of the questions/complaints on this list.

Mark

On Thu, Mar 30, 2017 at 11:48 AM, Christoph Päper <
christoph.pae...@crissov.de> wrote:

> Philippe Verdy  hat am 30. März 2017 um 00:40
> geschrieben:
>
> > There's no ISO 3166-1 code for Europe at the whole (does it exist
> legally if
> > we can't clearly define its borders?)
>
> `150` in UN M.49 which ISO 3166-1 was derived from and is compatible with.
> CLDR
> could safely adopt that if needed.
>
> No alpha-2 and hence no RIS sequence, though. An Emoji Tag Sequence would
> be
> straight-forward, though: U+1F3F4-E0031-E0035-E0030-E007F.
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Christoph Päper

Charlotte Buff >
> 
> Heck, if your device has a default font that includes CANCEL TAG (...) and
> therefore doesn’t render it,  
> then you won’t even be able to see the difference between a regular, generic
> black flag and an emoji that was meant to represent some region.  
> This could potentially lead to great misunderstandings since a plane black
> flag is often associated with anarchism and piracy,  
> but rather rarely with England, Scotland or Wales.  
> The waving white flag that was used as the base in earlier drafts at the very
> least had the benefit of looking like a “blank slate” of sorts.

White flags are associated with surrender (but also peace). That is at least as
bad as a black flag. The checkered flag U+1F3C1  could have been a compromise.
It is also readily associated with sports.

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Christoph Päper

Philippe Verdy  hat am 30. März 2017 um 00:40 geschrieben:

> There's no ISO 3166-1 code for Europe at the whole (does it exist legally if
> we can't clearly define its borders?)

`150` in UN M.49 which ISO 3166-1 was derived from and is compatible with. CLDR
could safely adopt that if needed.

No alpha-2 and hence no RIS sequence, though. An Emoji Tag Sequence would be
straight-forward, though: U+1F3F4-E0031-E0035-E0030-E007F.

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Martin J. Dürst


On 2017/03/30 06:17, Christoph Päper wrote:

Mark Davis ☕️ :



That isn't really the case. In particular, vendors can propose adding
additional subdivisions to the recommended list.


Awesome, "vendors" can do that. (._.m)

If I made an open-source emoji font that contained flags for all of the 5000ish
ISO 3166-2 codes that actually map to one, would I automatically be considered a
vendor?


I don't think so. But if you want to get more flags listed, then 
creating actual flags, with suitable licenses, and telling others to use 
them and tell other, and so on, may easily reach vendors sooner or later.




- 
- 
- 
-  <-


The last one currently already has support for UK countries, US states and
Canadian provinces. Go figure.


And most if not all of these flags are from Wikimedia. So that shows 
that open source has some influence, even without money.


Regards,   Martin.

Re: Unicode Emoji 5.0 characters now final

2017-03-30 Thread Mark Davis ☕️

> If I made an open-source emoji font that contained flags for all of the
> 5000ish
> ISO 3166-2 codes that actually map to one, would I automatically be
> considered a
> vendor?
>

Do I need to have to pay 18000(?) dollars a year for full membership
> first? (That's peanuts for multi-billion dollar companies, but
> unaffordable for
> most individuals and many FOSS projects.)
>

The answer to both of your questions is no.

Please see http://unicode.org/emoji/selection.html#timeline for details.
What the UTC is looking for is commitments from major vendors. It is not
sufficient to join Unicode: we have members who are not major vendors of
emoji. And there are some major vendors that are not members.

Of course, there is some judgment involved as to what constitutes "major":
at one extreme clearly 1B DAUs qualifies, and at the other extreme, 1K
doesn't.

Mark

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Philippe Verdy

2017-03-30 1:29 GMT+02:00 Richard Wordingham <
richard.wording...@ntlworld.com>:

> On Thu, 30 Mar 2017 00:52:03 +0200
> Charlotte Buff  wrote:
>
> > And this is where the problem becomes even worse. Because there are no
> > “flag tofus” for 3166-2 regions. Unlike Regional Indicator Sequences,
> > the fallback for all unsupported tag sequences looks exactly the same
> > and carries absolutely no meaning unless put through some Unicode
> > analyzer machine:  WAVING BLACK FLAG, a well-supported emoji that
> > means nothing in the context it is used in, followed by a single,
> > featureless tofu. At least a text containing ten different
> > unsupported RI sequences will show you ten distinct images, even if
> > you are completely unaware that those peculiar pairs of colourful
> > letters you’ve just been sent are used to build flag emoji.
>
> I don't see why the tag characters can't be represented by some form of
> corresponding ASCII characters as a fallback registering.  The
> bracketing pair U+1F3F4 WAVING BLACK FLAG .. U+E007F CANCEL TAG
> declares a sequence of 3 to 6 intervening ordinary tags to be a flag
> emoji, and in an OpenType font a GSUB contextual substitution can
> easily convert unrecognised sequences to modified ASCII characters.  It
> does not have to explicitly handle each possible combination.
>

I also think so: the unique black flag (even if it is marked on the corner
with a ? on a diamond) is the worst solution. You can easily set up a
left-side part showing the hoist and the start of the flag, a right part
showing the floating end of the flag, and display the letters with top and
bottom borders connecting together and with the left-side and right-side
part. May be you can also arrange the letters in rows: the first top row
for the 2-letter ISO 3166-1 code, the bottom row for the appended 1-to-4
characters code (letters and digits) of the subdivision.

You may also improve the display by displaying the last letters on top of
the national flag. If subdivision codes are known you may alternatively
render a short name of the subdivision above or below the national flag
(but here there's a problem of language choice: even if official names are
accepted, some subdivisions have several official names in distinct
languages, possibly in distinct scripts; and when there's only one,
probably many users will have problems reading these labels in a foreign
script, such as Arabic or Chinese)

My opinion is that renderers should better support the interactive display
of hints in the user language of its UI, independantly of the language of
the encoded document itself, if the rendering engine is capable of such
interactivity, provided that there's no other competing hint such as title
attributes which may be used in HTML to explain the flag ven when it is
actually rendered. The same will apply for non-graphical rendering such as
aural rendering., instead of spelling the code letters (as a last fallback).

May be it will be larger than an actual flag, but I see no problem at all
if all flags do not have the same ratio (in fact ratios are already not the
same for the official flags of recognized countries). There is absolutely
no obligation

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Charlotte Buff

Richard Wordingham wrote:

> I don't see why the tag characters can't be represented by some form of
> corresponding ASCII characters as a fallback registering. The
> bracketing pair U+1F3F4 WAVING BLACK FLAG .. U+E007F CANCEL TAG
> declares a sequence of 3 to 6 intervening ordinary tags to be a flag
> emoji, and in an OpenType font a GSUB contextual substitution can
> easily convert unrecognised sequences to modified ASCII characters. It
> does not have to explicitly handle each possible combination.

I suppose this is an adequate solution, but it*’*s also needlessly
convoluted in comparison to RIS where good fallback behaviour just happens
automatically with only the most bare-bones font feature imaginable, i.e.
simply displaying single characters one after another as they would appear
anyways. It is also questionable whether most vendors are going to employ
such system in the first place.

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Richard Wordingham

On Thu, 30 Mar 2017 00:52:03 +0200
Charlotte Buff  wrote:

> And this is where the problem becomes even worse. Because there are no
> “flag tofus” for 3166-2 regions. Unlike Regional Indicator Sequences,
> the fallback for all unsupported tag sequences looks exactly the same
> and carries absolutely no meaning unless put through some Unicode
> analyzer machine:  WAVING BLACK FLAG, a well-supported emoji that
> means nothing in the context it is used in, followed by a single,
> featureless tofu. At least a text containing ten different
> unsupported RI sequences will show you ten distinct images, even if
> you are completely unaware that those peculiar pairs of colourful
> letters you’ve just been sent are used to build flag emoji.

I don't see why the tag characters can't be represented by some form of
corresponding ASCII characters as a fallback registering.  The
bracketing pair U+1F3F4 WAVING BLACK FLAG .. U+E007F CANCEL TAG
declares a sequence of 3 to 6 intervening ordinary tags to be a flag
emoji, and in an OpenType font a GSUB contextual substitution can
easily convert unrecognised sequences to modified ASCII characters.  It
does not have to explicitly handle each possible combination.

Richard.

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Charlotte Buff

Ken Whistler wrote:
> *But*, the ones who do have flags on their
> phones don't want to be in the situation where the iPhone has a flag of
> Scotland which then shows up as a flag tofu on an Android phone, but an
> Android phone has a flag of Texas which then shows up as a flag tofu on
> on iPhone, etc., etc. That way leads to customer complaint madness, with
> 1000's (hundreds of 1000's?) of complaints: "My phone is screwed up, fix
> it!"

And this is where the problem becomes even worse. Because there are no
“flag tofus” for 3166-2 regions. Unlike Regional Indicator Sequences, the
fallback for all unsupported tag sequences looks exactly the same and
carries absolutely no meaning unless put through some Unicode analyzer
machine:  WAVING BLACK FLAG, a well-supported emoji that means nothing in
the context it is used in, followed by a single, featureless tofu. At least
a text containing ten different unsupported RI sequences will show you ten
distinct images, even if you are completely unaware that those peculiar
pairs of colourful letters you’ve just been sent are used to build flag
emoji.

Heck, if your device has a default font that includes CANCEL TAG (like my
phone does, but my laptop doesn’t) and therefore doesn’t render it, then
you won’t even be able to see the difference between a regular, generic
black flag and an emoji that was meant to represent some region. This could
potentially lead to great misunderstandings since a plane black flag is
often associated with anarchism and piracy, but rather rarely with England,
Scotland or Wales. The waving white flag that was used as the base in
earlier drafts at the very least had the benefit of looking like a “blank
slate” of sorts.

This is one of the few cases where the terrible web browser of the Nintendo
3DS can actually be considered superior to any modern device because for
some bizarre reason it applies modulo 65,536 to all code points on display,
resulting in tag characters rendering as visible ASCII.

It would have been much more sensible to construct subdivision flags out of
new, visible characters just like RI sequences. That way we could have had
a fallback rendering that is actually in any way useful. We could also have
preserved the original properties of the tag characters. Last time I
checked their correct usage for language tagging is still rigorously
explained in the standard despite deprecation.

But no, we absolutely had to put out this update as soon as possible
because peoplez want da emojiz. We had to use existing characters for
region sequences because if we had actually given ourselves enough time to
properly think this whole endeavour through we couldn't have made the
precious Scottish flag available until Unicode 11. (Although that hardly
seems to matter anyways seeing how we apparently now release technical
reports and data files that rely on certain characters before those
characters even exist in the standard.) And we had to use the invisible tag
characters from Plane 14 because potatoes, I guess.

You know, back when Emoji Modifiers were released I was initially sceptical
of them being spacing, visibly rendering pictographs rather than formatting
characters. Nowadays I understand that decision. Too bad we were seemingly
unable to make the same decision for flags. I eagerly await the return of
hair colour tags in Emoji 6.

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Philippe Verdy

Note: in your collection you say that the EU flag is the flag of the
European Union, actually it is a flag for Europe at whole, made and
proposed since long by the CoE, Council of Europe (not the european Union
that still did not exist, and not even the EEC or even the CECA that were
also created after the European Council.

The European Union displays the EC flag **under permission** permanently
granted by the European Council. The non-EU members that are CoE members,
or that were invited by the CoE, have a legal right to display it (so it
includes as well Turkey since ever as it was a founding member of CoE, also
Russia, Belarus even if its seat in the EC is suspended, Ukraine,
Kazakhstan, Morocco, Vatican, Andorra, Iceland, Switzerland, Liechtenstein,
Norway...). When the CECA was created (and later the European communities)
it had initially no flag, but it rapidly started to reuse the European flag
proposed by the EC, because every member of the European Community was also
a member of EC,

In ISO 3166-1 however the "EU" code was granted to the European Union (for
legal reasons related to some WIPO standards with specific rules enforced
throughout the EU, plus optionally some volunteer countries in the EEA). It
usually displays the flag adopted by the CoE. There's no ISO 3166-1 code
for Europe at the whole (does it exist legally if we can't clearly define
its borders?) or the CoE itself (which has a logo derived now from the
European flag, but distinctive and reserved as a logo and not encodable.

Note that there's also a flag for a wider region with 56 countries covered
by the EBU (Eurovision Broadcast union), including for example Israel,
Palestine, Armenia, Georgia, Syria, Lebanon, Morocco, Algeria, Tunisia,
Libya and Egypt (not to be confused by the logos used by the Eurovision
song contest: these logos are not flags). However the EBU still does not
include Kazahstan. The EBU howver is a private organization, and its "flag"
looking like a blue "(O)" on white, is in fact a logo and not encodable.
Another logo was used in the past that looked similar to the European flag
with stars on a circle (this old logo, initially monochromatic using white
stars on grey, slightly modernized, is still visible  along with some video
test patterns at start of some Eurovision broadcasts).

2017-03-29 23:52 GMT+02:00 Rebecca Bettencourt :

> On Wed, Mar 29, 2017 at 2:17 PM, Christoph Päper <
> christoph.pae...@crissov.de> wrote:
>
>> If I made an open-source emoji font that contained flags for all of the
>> 5000ish
>> ISO 3166-2 codes that actually map to one, would I automatically be
>> considered a
>> vendor? Do I need to have to pay 18000(?) dollars a year for full
>> membership
>> first? (That's peanuts for multi-billion dollar companies, but
>> unaffordable for
>> most individuals and many FOSS projects.)
>>
>
> ...
>
> Those are desired, for sure, but so are emoji flags for Kurdistan,
>> Confederated
>> States of America, Romani, Oromo, South Vietnam, Esperanto, Anarchy,
>> Communism,
>> Bisexuality, Transgenderism, Sami, Pan-Africanism, Australian
>> Aboriginals, and
>> many more. Of these, only the Kurdish and the Sami flag *may* be covered
>> by
>> Unicode Emoji 5.0+ (possibly with multiple codes) until yet another
>> (Tag-based)
>> scheme is adopted.
>>
>
> Heh, I actually started an open-source emoji font that kinda does this:
>
> https://github.com/kreativekorp/vexillo
>
> It encodes not only some subdivision flags using sequences like [usca],
> [ustx], and [caqc], but a whole lot of nowhere-near-standardized-for-encoding
> flags under the XX code, such as [xxcascadia], [xxconlangesperanto],
> [xxpridebisexual], [xxpridetrans], etc.
>
> And hey, it works already in OS X 10.8+ and Firefox, even if it makes text
> selection a little dodgy. :)
>
>
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Asmus Freytag


On 3/29/2017 2:07 PM, Doug Ewell wrote:

Ken Whistler wrote:


*But*, the ones who do have flags on their phones don't want to be in
the situation where the iPhone has a flag of Scotland which then shows
up as a flag tofu on an Android phone, but an Android phone has a flag
of Texas which then shows up as a flag tofu on on iPhone, etc., etc.
That way leads to customer complaint madness, with 1000's (hundreds of
1000's?) of complaints: "My phone is screwed up, fix it!"

Doesn't this same problem exist for other emoji, or non-emoji, that are
supported on some phones but not others? What's the customer service
resolution in those cases?
  


Sure, let them go form a consortium and agree on which ones are in the 
recommended set. But why form a new consortium if you have one already 
where they are all members?


Agreeing on recommended level of support in the sense of "best practice" 
is something that is done for many of the specifications, for example 
some of the algorithms.


A useful guide in evaluating whether it's appropriate to "recommend" 
something is to treat it as if it was mandatory, but with a costly 
override option: if you decide to go against the recommendation you'd 
better have a really solid reason.


Recommending to vendors to support a minimal set is one thing. 
Recommending to users to only use sequences from that set / or vendors 
to not extend coverage beyond the minimum is something else. Both use 
the word "recommendation" but the flavor is rather different (which 
becomes more obvious when you re-phrase as I suggested).


That seems to be the source of the disconnect.

A./

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Rebecca Bettencourt

On Wed, Mar 29, 2017 at 2:17 PM, Christoph Päper <
christoph.pae...@crissov.de> wrote:

> If I made an open-source emoji font that contained flags for all of the
> 5000ish
> ISO 3166-2 codes that actually map to one, would I automatically be
> considered a
> vendor? Do I need to have to pay 18000(?) dollars a year for full
> membership
> first? (That's peanuts for multi-billion dollar companies, but
> unaffordable for
> most individuals and many FOSS projects.)
>

...

Those are desired, for sure, but so are emoji flags for Kurdistan,
> Confederated
> States of America, Romani, Oromo, South Vietnam, Esperanto, Anarchy,
> Communism,
> Bisexuality, Transgenderism, Sami, Pan-Africanism, Australian Aboriginals,
> and
> many more. Of these, only the Kurdish and the Sami flag *may* be covered by
> Unicode Emoji 5.0+ (possibly with multiple codes) until yet another
> (Tag-based)
> scheme is adopted.
>

Heh, I actually started an open-source emoji font that kinda does this:

https://github.com/kreativekorp/vexillo

It encodes not only some subdivision flags using sequences like [usca],
[ustx], and [caqc], but a whole lot of
nowhere-near-standardized-for-encoding flags under the XX code, such as
[xxcascadia], [xxconlangesperanto], [xxpridebisexual], [xxpridetrans], etc.

And hey, it works already in OS X 10.8+ and Firefox, even if it makes text
selection a little dodgy. :)

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Christoph Päper

Richard Wordingham :
> "Doug Ewell"  wrote:
> 
> > "Not recommended," "not standard," "not interoperable," or any other
> > term ESC settles on for the 5000+ valid flag sequences that are not
> > England, Scotland, and Wales is just a short, easy step away from
> > deprecation for these as well.

*Sigh* 
Instead of 26 RIS characters and all the TAGs, Unicode should have added a
single new character: U+2065 Flag Code Joiner.

> It's certainly on the cards that the sequence for the Scottish flag will
> be deprecated in favour of an RI sequence.

Which would very likely be U+1F1E6-1F1E7  'AB' for Alba, because all other
intuitive alpha-2 code elements are either reserved or already assigned.

Traction and Deprecation (was: Re: Unicode Emoji 5.0 characters now final)

2017-03-29 Thread Ken Whistler



On 3/29/2017 1:12 PM, Doug Ewell wrote:

Is that common practice in Unicode, that if something doesn't gain
significant traction in the comparatively short term, it becomes a
candidate for deprecation?


If a mechanism was dodgy in the first place and was dubious as a part of 
plain text, then yes.


If a mechanism is clearly a necessary part of the text model, but takes 
a while to catch on, because it is inherently complicated to implement 
and roll out, then no.


Remember, it took a good part of a decade for significant support of 
combining marks to start appearing in Unicode implementations. Even 
longer for fairly good support of the Indic rendering models.


If you are worried about the emoji tag sequence mechanism, then I'd say 
no. Once the use of regional indicator symbols caught on to represent 
flag emoji, that basically settled the question of whether pictographic 
symbols for flags were a part of plain text. Once the emoji tag 
sequences are rolled out for the regional flags (a process I can surmise 
is happening even now as we debate this), there will be no going back. 
You can be guaranteed, given the current attention to Brexit, that the 
tag sequence for the Scotland flag, at least, will leap up the emoji 
frequency list almost immediately. And that data will end up being 
supported essentially forever.


--Ken

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Christoph Päper

Mark Davis ☕️ :
> On Tue, Mar 28, 2017 at 11:56 AM, Joan Montané  wrote:
> 
> > 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
> > arises something like chicken-egg problem. Vendors don't easily add new
> > subdivision-flags (because they aren't recommended), and Unicode doesn't
> > recommend new subdivision flags (because they aren't supported by vendors).
> 
> That isn't really the case. In particular, vendors can propose adding
> additional subdivisions to the recommended list.

Awesome, "vendors" can do that. (._.m)

If I made an open-source emoji font that contained flags for all of the 5000ish
ISO 3166-2 codes that actually map to one, would I automatically be considered a
vendor? Do I need to have to pay 18000(?) dollars a year for full membership
first? (That's peanuts for multi-billion dollar companies, but unaffordable for
most individuals and many FOSS projects.)

Someone could try to push such an edit onto Emojione, Twemoji or Noto Emoji, but
something tells me none of the maintainers would accept flag PRs by random users
unless UTR/UTS#51 already recommended them.

- 
- 
- 
-  <-

The last one currently already has support for UK countries, US states and
Canadian provinces. Go figure.

> The UTC Considerations ... would come into play in assessing those proposals.
> So it is certainly possible for there to be (say) a flag of Texas or
>Catalonia
> appearing in an Emoji 6.0 release this year. 

Those are desired, for sure, but so are emoji flags for Kurdistan, Confederated
States of America, Romani, Oromo, South Vietnam, Esperanto, Anarchy, Communism,
Bisexuality, Transgenderism, Sami, Pan-Africanism, Australian Aboriginals, and
many more. Of these, only the Kurdish and the Sami flag *may* be covered by
Unicode Emoji 5.0+ (possibly with multiple codes) until yet another (Tag-based)
scheme is adopted.

> Similarly, Microsoft could propose adding the ninja cat ZWJ sequences.

I still fail to see how it is a good or smart thing to have to maintain Emoji
Tag Sequences *and* Emoji ZWJ Sequences, when adopting the latter for flags
would have had at least the following advantages: 

- actually useful fallback
- application beyond ISO 3166 restrictions

RE: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Doug Ewell

Ken Whistler wrote:

> *But*, the ones who do have flags on their phones don't want to be in
> the situation where the iPhone has a flag of Scotland which then shows
> up as a flag tofu on an Android phone, but an Android phone has a flag
> of Texas which then shows up as a flag tofu on on iPhone, etc., etc.
> That way leads to customer complaint madness, with 1000's (hundreds of
> 1000's?) of complaints: "My phone is screwed up, fix it!"

Doesn't this same problem exist for other emoji, or non-emoji, that are
supported on some phones but not others? What's the customer service
resolution in those cases?
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Andrew West

On 29 March 2017 at 21:09, Doug Ewell  wrote:
>
>> I think "recommended" could be renamed to "(expected to be) widely
>> implemented".
>
> That's a modest improvement; it shifts from an advisory health warning
> not to use certain sequences to what it is, speculation that some
> sequences will be far better supported in the field than others.

I don't think that would work.
http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt explicitly
lists just the three subdivision flags for England, Scotland and Wales
under Emoji Tag Sequences, which indicates that they are special in an
undefined way that none of the thousands of other potential
subdivision flag tag sequences are. There must be a higher bar for
inclusion in the Emoji data files than simply that they are expected
to be widely implemented. Their inclusion in the Emoji data files and
the Emoji charts
(http://www.unicode.org/emoji/charts/emoji-ordering.html) must
indicate that only these three tag sequences are recommended or
sanctioned by the UTC.

(In case anyone thinks I support the current situation, let me state
that I disagree vehemently with the UTC decision to only "recommend"
these three particular subdivision flag tag sequences.)

Andrew

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Ken Whistler



On 3/29/2017 1:12 PM, Doug Ewell wrote:

I would think vendors could make their own business decisions about what
flags to support. "Hmm, yeah, definitely Texas, maybe Lombardy, not so
sure about Colorado, probably not Guna Yala." I don't see why they had
to be essentially told what to support and what not to.


I think you have it approximately backwards. It isn't the UTC telling 
the vendors "what to support and what not to" -- it was the vendors 
saying "this is what we need to support, and we'd like to not do it in a 
haphazard way, so let's tell the UTC what we want them to document in 
the data for UTS #51."


You are correct that the vendors can make their own business decisions. 
And apparently as of now, Microsoft, for whatever reason, has made its 
business decision not to support flag emoji *at all* on its phones. 
O.k., that is their decision. So no Texas, no Lombardy, no Colorado, no 
Guna Yala, but also no Japan, no Great Britain, no Scotland... Other 
vendors have decided *to* support flag emoji on their phone platforms. 
O.k., that is their decision. *But*, the ones who do have flags on their 
phones don't want to be in the situation where the iPhone has a flag of 
Scotland which then shows up as a flag tofu on an Android phone, but an 
Android phone has a flag of Texas which then shows up as a flag tofu on 
on iPhone, etc., etc. That way leads to customer complaint madness, with 
1000's (hundreds of 1000's?) of complaints: "My phone is screwed up, fix 
it!"


Or maybe you want the job on the consumer complaint line about that 
topic. ;-)


--Ken

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Doug Ewell

Martin J. Dürst wrote:

> I think there is some missing information here. First, the original
> proposal that used invalid UTF-8 sequences never was an RFC, only an
> Internet Draft.

Yes, you're right. I realized that a minute after "Send" but didn't
think it changed the story enough to justify a correction. For the
curious, the I-D is at
https://www.ietf.org/archive/id/draft-ietf-acap-mlsf-01.txt .

> But what's more important, the protocol that motivated all this work
> (ACAP) never went anywhere. Nor did any other use of the plane 14
> language tag characters get any kind of significant traction. That
> lead to depreciation, because it would have been a bad idea to let
> people think that the information in these taggings would actually be
> used.

Is that common practice in Unicode, that if something doesn't gain
significant traction in the comparatively short term, it becomes a
candidate for deprecation?

> For some people (including me), that was always seen as the likely
> outcome; the language tag characters were mostly introduced as a
> defensive mechanism (way better than invalid UTF-8) rather than
> something we hoped everybody would jump on. Putting them on plane 14
> (which meant that it would be four bytes for each character, and
> therefore quite a lot of bytes for each tag) was part of that message.

I understand the "defensive" aspect of trying to prevent people from
abusing Unicode, especially in the 1997–1998 time frame when UTF-8 was
still new and people didn't realize the cost of tampering with it.

But if you're going to build a mechanism at all, it seems peculiar to
define it in full but then discourage its intended use at the outset, or
to build it in such a way that users will find it difficult or
unpalatable to use.

> I think the situation is vastly different here. First, the Consortium
> never officially 'activated' any subdivision flags, so it would be
> impossible to deprecate them.

The Emoji 5.0 mechanism of using tag sequences for three subdivision
flags was announced earlier this week. The specification grudgingly
allows, but non-recommends, use of the mechanism for any other flags. It
is that grudging allowance that could be deprecated, not any of the
specific flags.

> Second, we already see some pressure (on this list) to 'recommend'
> more of these, and I guess the vendors and the Consortium will give in
> to this pressure, even if slowly and to some extent quite reluctantly.
> It's anyone's bet in what time frame and order e.g. the flags of
> California and Texas will be 'recommended'. But I have personally no
> doubt that these (and quite a few others) will eventually make it,
> even if I have mixed feelings about that.

Then what was the benefit of "not recommending" them in the first place?
Why is it a problem if vendors look at the list of 5100 or so
subdivisions, or even the small subset that actually have flags, and
think, "OMG, look at all those new flags we'll be forced to support"? Is
this any different from when a new CJK extension or other large block of
characters is added?

I would think vendors could make their own business decisions about what
flags to support. "Hmm, yeah, definitely Texas, maybe Lombardy, not so
sure about Colorado, probably not Guna Yala." I don't see why they had
to be essentially told what to support and what not to. 

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Doug Ewell

Markus Scherer wrote:

> I think "recommended" could be renamed to "(expected to be) widely
> implemented". 

That's a modest improvement; it shifts from an advisory health warning
not to use certain sequences to what it is, speculation that some
sequences will be far better supported in the field than others.

I still don't see why this distinction is necessary. It's not made for
other emoji or non-emoji. I have no fonts for Tai Tham,¹ which has been
in Unicode since 2009, but I don't see any warnings against using Tai
Tham because someone like me might not have a font for it.

¹ No, I'm not looking for one; that isn't the point.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread Markus Scherer

I think "recommended" could be renamed to "(expected to be) widely
implemented".
markus

Re: Unicode Emoji 5.0 characters now final

2017-03-29 Thread William_J_G Overington

Mark E. Shoulson wrote:

> Kind of have to agree with Doug here. Either support the mechanism or don't.  
> Saying "we, you CAN do this if you WANT to" always implies a "...but 
> you probably shouldn't." Why even bother making it a possibility?

Mark's use of we made me smile and brightened my day, because it 
resonated with my use, in a different context, of wol near the end of the 
last page of Chapter 16 of my novel.

http://www.users.globalnet.co.uk/~ngo/localizable_sentences_the_novel_chapter_016.pdf
 A PDF document of size 31.01 kilobytes.

Returning to what Doug and Mark wrote. When I read things like "not 
recommended" I imagine a situation where someone who is employed by a large 
information technology company being the person who actually sits down with the 
specification documents and makes a decision as to what to encode. That person 
is probably not one of the people who is in charge of running the company.

So the person may well have an annual review meeting with people several steps 
up the hierarchy of the company, people who can promote, grudgingly continue to 
employ, or sack the employee.

So I imagine the possibility of, at that meeting, the question of "Why did you 
implement all of those flags in our product?" being asked.

The employee then explains his or her thinking, a desire to help end users and 
to have compatibility with communication with devices made by other 
manufacturers and for it all to be colourful and fun.

The employee is then asked if he or she knew that implementation was not 
recommended. Did he or she know of that and went the other way thinking he or 
she knew better or had he or she not read that part of the documentation.

So maybe the employee takes such a possible scenario into account when deciding 
whether to implement the flags in the first place. Relying on "not recommended" 
is safer. If the people higher up get letters from consumers asking for 
implementation and they ask for it to be done, then good, that would be 
enjoyable, but why be the one who could be criticised.

I also imagine a scenario that instead of the "not recommended" that the advice 
might have been that it would be great and progressive if lots of flags were 
implemented in lots of products and it would be great if it could be done as 
soon as possible, by this summer if possible, ready for displaying at the 
conference in the autumn and to help that along here are some links to some 
free-to-use open source artwork that Unicode Inc. is making available in case 
you want to use it and here are some links to some free-to-use open source 
OpenType font glyph substitution code that Unicode Inc. is making available in 
case you want to use it.

Well, why not? :-)

William Overington

Wednesday 29 March 2017

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Martin J. Dürst


Hello Doug,

On 2017/03/29 03:41, Doug Ewell wrote:


If this story sounds vaguely familiar to old-timers, it's exactly the
path that was followed the last time Plane 14 tag characters were under
discussion, between 1998 and 2000: someone wrote an RFC to embed
language tags in plain text using invalid UTF-8 sequences; Unicode
responded by introducing a proper, conformant mechanism to use Plane 14
characters instead; then the conformant replacement mechanism itself was
deprecated and users were told to use out-of-band tagging, exactly what
the original RFC sought to avoid.


I think there is some missing information here. First, the original 
proposal that used invalid UTF-8 sequences never was an RFC, only an 
Internet Draft. But what's more important, the protocol that motivated 
all this work (ACAP) never went anywhere. Nor did any other use of the 
plane 14 language tag characters get any kind of significant traction. 
That lead to depreciation, because it would have been a bad idea to let 
people think that the information in these taggings would actually be used.


For some people (including me), that was always seen as the likely 
outcome; the language tag characters were mostly introduced as a 
defensive mechanism (way better than invalid UTF-8) rather than 
something we hoped everybody would jump on. Putting them on plane 14 
(which meant that it would be four bytes for each character, and 
therefore quite a lot of bytes for each tag) was part of that message.




"Not recommended," "not standard," "not interoperable," or any other
term ESC settles on for the 5000+ valid flag sequences that are not
England, Scotland, and Wales is just a short, easy step away from
deprecation for these as well.


I think the situation is vastly different here. First, the Consortium 
never officially 'activated' any subdivision flags, so it would be 
impossible to deprecate them. Second, we already see some pressure (on 
this list) to 'recommend' more of these, and I guess the vendors and the 
Consortium will give in to this pressure, even if slowly and to some 
extent quite reluctantly. It's anyone's bet in what time frame and order 
e.g. the flags of California and Texas will be 'recommended'. But I have 
personally no doubt that these (and quite a few others) will eventually 
make it, even if I have mixed feelings about that.


Regards,   Martin.

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark E. Shoulson

Kind of have to agree with Doug here. Either support the mechanism or 
don't.  Saying "we, you CAN do this if you WANT to" always 
implies a "...but you probably shouldn't."  Why even bother making it a 
possibility?


On 03/28/2017 02:41 PM, Doug Ewell wrote:

"Even though it is possible to support the US states, or any subset of
them, implementations don’t have to." Well, of course they don't.
Implementations don't have to support the three British flags either if
they don't want to, or any national flags or other emoji, or any
particular character for that matter. The superfluous statement is
easily reduced to "Don't do this."


That's a pretty good re-statement.

~mark

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Markus Scherer

On Tue, Mar 28, 2017 at 11:41 AM, Doug Ewell  wrote:

> Mark Davis wrote:
>
> > 3. Valid, but not recommended: "usca". Corresponds to the valid
> > Unicode subdivision code for California according to
> > http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
> > and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>
> "Not recommended" is no better and no less disappointing than "not
> standard." Both phrases imply strongly that the sequence, while
> syntactically valid, should not be used.
>

I think the distinction between "valid" and "recommended" is confusing
terminology-wise, but it does make sense to have a distinction between
"valid" and "we know that one or more vendors are motivated to show these
sequences as single glyphs". "valid" is clearly defined, and then there is
a subset of valid that's listed in a catalog.

Just like anyone is free to string some characters together with
intervening ZWJ, but it is useful to have a catalog of sequences that are,
or are going to be, in actual use, so that it is known which sequences are
likely to work more or less the same on some set of devices.

This right now is the right time to propose better wording in the spec so
that implementers like you don't feel like they may get the rug pulled from
under them down the road.

markus

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Richard Wordingham

On Tue, 28 Mar 2017 11:41:38 -0700
"Doug Ewell"  wrote:

> "Not recommended," "not standard," "not interoperable," or any other
> term ESC settles on for the 5000+ valid flag sequences that are not
> England, Scotland, and Wales is just a short, easy step away from
> deprecation for these as well.

It's certainly on the cards that the sequence for the Scottish flag will
be deprecated in favour of an RI sequence.

Richard.

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Doug Ewell

Mark Davis wrote:

> 3. Valid, but not recommended: "usca". Corresponds to the valid
> Unicode subdivision code for California according to
> http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
> and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.

"Not recommended" is no better and no less disappointing than "not
standard." Both phrases imply strongly that the sequence, while
syntactically valid, should not be used.

Burying a disclaimer that "implementations can support them, but they
may not interoperate well" in the speaker's notes of slide 38 of a
53-page presentation does nothing to change this perception.

"Even though it is possible to support the US states, or any subset of
them, implementations don’t have to." Well, of course they don't.
Implementations don't have to support the three British flags either if
they don't want to, or any national flags or other emoji, or any
particular character for that matter. The superfluous statement is
easily reduced to "Don't do this."

Joan Montané's return to the list to comment on this issue was
interesting because of a post from February 2015, in which Andrea
Giammarchi reported [1] on Joan's request [2] for Twitter to support
flags for specific "active online communities" that happened to have a
TLD, by stringing three or more Regional Indicator Symbols together:

> [S][C][O][T] --> it shows Scottish flag
> [C][Y][M][R][U] --> it shows a Welsh flag
> [B][Z][H] --> it shows a Breton flag
> [C][A][T] --> it shows Catalan flag
> [E][U][S] --> it shows a Basque flag
> [G][A][L] --> it shows a Gallician flag

[1] http://www.unicode.org/mail-arch/unicode-ml/y2015-m02/0039.html
[2] https://github.com/twitter/twemoji/issues/40

Of course this approach was incompatible with conformant use of RIS;
visit [2] with an RIS-conformant browser to see the inadvertently
displayed flags of Seychelles, Cyprus, Belize, Canada, etc.

I don't know if the ensuing thread helped inspire ESC to pursue the
present mechanism involving sequences of Plane 14 tags -- the earliest
mention I can find is PRI #299, just a few months later -- but the
intent seemed straightforward and sensible: provide an official,
conformant mechanism to support a recognized user need, with a suitable
fallback strategy, rather than encouraging users via inaction to adopt a
non-conformant and broken solution.

Unfortunately, the follow-up turned out to be "... and then discourage
THAT mechanism as well, except in a couple of selected cases, and tell
people to use stickers instead."

If this story sounds vaguely familiar to old-timers, it's exactly the
path that was followed the last time Plane 14 tag characters were under
discussion, between 1998 and 2000: someone wrote an RFC to embed
language tags in plain text using invalid UTF-8 sequences; Unicode
responded by introducing a proper, conformant mechanism to use Plane 14
characters instead; then the conformant replacement mechanism itself was
deprecated and users were told to use out-of-band tagging, exactly what
the original RFC sought to avoid.

"Not recommended," "not standard," "not interoperable," or any other
term ESC settles on for the 5000+ valid flag sequences that are not
England, Scotland, and Wales is just a short, easy step away from
deprecation for these as well.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️

Thanks

Mark

On Tue, Mar 28, 2017 at 1:01 PM, Philippe Verdy  wrote:

> I just filed the bug in the CLDR contact form.
>
> 2017-03-28 12:49 GMT+02:00 Mark Davis ☕️ :
>
>> Thanks. Probably best as:
>>
>> unicode_locale_id = unicode_language_id
>> ( transformed_extensions unicode_locale_extensions?
>> | unicode_locale_extensions transformed_extensions?
>> )? ;
>>
>> even clearer would be two steps:
>>
>> unicode_locale_id = unicode_language_id extensions? ;
>>
>> extensions= transformed_extensions unicode_locale_extensions?
>>   | unicode_locale_extensions transformed_extensions? ;
>>
>> Could you file a CLDR ticket on this?
>>
>> 
>> Mark
>>
>> On Tue, Mar 28, 2017 at 12:36 PM, Philippe Verdy 
>> wrote:
>>
>>> I note this in TR32
>>> *3.2 Unicode Locale Identifier
>>> *
>>>
>>> EBNF
>>> ABNF
>>>
>>> unicode_locale_id
>>>  =
>>> unicode_language_id
>>>   (transformed_extensions
>>>   unicode_locale_extensions?
>>> | unicode_locale_extensions?
>>>   transformed_extensions?) ; = unicode_language_id
>>>   ([trasformed_extensions
>>>   [unicode_locale_extensions]]
>>> / [unicode_locale_extensions
>>>   [transformed_extensions]])
>>>
>>> * first there's a typo in the ABNF syntax ("trasformed")
>>> * the syntax is not strictly equivalent, or the ABNF is unnecessarily
>>> not context-free
>>>
>>> It should better be:
>>>
>>> EBNF
>>> ABNF
>>>
>>> unicode_locale_id
>>>  =
>>> unicode_language_id
>>>  (transformed_extensions
>>>   unicode_locale_extensions?
>>> | unicode_locale_extensions
>>>   transformed_extensions?)?; = unicode_language_id
>>>  [transformed_extensions
>>>   [unicode_locale_extensions]
>>> / unicode_locale_extensions
>>>   [transformed_extensions]]
>>>
>>>
>>>
>>> 2017-03-28 11:56 GMT+02:00 Joan Montané :
>>>


 2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :

> To add to what Ken and Markus said: like many other identifiers, there
> are a number of different categories.
>
>1. *Ill-formed: *"$1"
>2. *Well-formed, but not valid: *"usx". Is *syntactic* according
>to http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_
>sequence, but is not *valid* according to
>http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
>g-sequences
>
> 
>.
>3. *Valid, but not recommended: "usca". *Corresponds to the valid
>Unicode subdivision code for California according to
>http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
>g-sequences
>
> 
>and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/
>.
>4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>subdivision code for Scotland, and *is* listed in
>http://unicode.org/Public/emoji/5.0/
>.
>
>  As Ken says, the terminology is a little bit in flux for term
> 'recommended'. TR51 is still open for comment, although we won't make any
> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>

 Just two remarks

 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode
 site) arises something like chicken-egg problem. Vendors don't easily add
 new subdivision-flags (because they aren't recommended), and Unicode
 doesn't recommend new subdivision flags (because they aren't supported by
 vendors).

 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
 valid, but not recommended, Unicode subdivisions codes eligible? For
 instances, say, could someone adopt California, Texas, Pomerania, or
 Catalonia flags?


 Regards,
 Joan Montané


>>>
>>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Philippe Verdy

I just filed the bug in the CLDR contact form.

2017-03-28 12:49 GMT+02:00 Mark Davis ☕️ :

> Thanks. Probably best as:
>
> unicode_locale_id = unicode_language_id
> ( transformed_extensions unicode_locale_extensions?
> | unicode_locale_extensions transformed_extensions? )?
> ;
>
> even clearer would be two steps:
>
> unicode_locale_id = unicode_language_id extensions? ;
>
> extensions= transformed_extensions unicode_locale_extensions?
>   | unicode_locale_extensions transformed_extensions? ;
>
> Could you file a CLDR ticket on this?
>
> 
> Mark
>
> On Tue, Mar 28, 2017 at 12:36 PM, Philippe Verdy 
> wrote:
>
>> I note this in TR32
>> *3.2 Unicode Locale Identifier
>> *
>>
>> EBNF
>> ABNF
>>
>> unicode_locale_id
>>  =
>> unicode_language_id
>>   (transformed_extensions
>>   unicode_locale_extensions?
>> | unicode_locale_extensions?
>>   transformed_extensions?) ; = unicode_language_id
>>   ([trasformed_extensions
>>   [unicode_locale_extensions]]
>> / [unicode_locale_extensions
>>   [transformed_extensions]])
>>
>> * first there's a typo in the ABNF syntax ("trasformed")
>> * the syntax is not strictly equivalent, or the ABNF is unnecessarily not
>> context-free
>>
>> It should better be:
>>
>> EBNF
>> ABNF
>>
>> unicode_locale_id
>>  =
>> unicode_language_id
>>  (transformed_extensions
>>   unicode_locale_extensions?
>> | unicode_locale_extensions
>>   transformed_extensions?)?; = unicode_language_id
>>  [transformed_extensions
>>   [unicode_locale_extensions]
>> / unicode_locale_extensions
>>   [transformed_extensions]]
>>
>>
>>
>> 2017-03-28 11:56 GMT+02:00 Joan Montané :
>>
>>>
>>>
>>> 2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :
>>>
 To add to what Ken and Markus said: like many other identifiers, there
 are a number of different categories.

1. *Ill-formed: *"$1"
2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
,
but is not *valid* according to http://unicode.org/reports/tr5
1/proposed.html#valid-emoji-tag-sequences

 
.
3. *Valid, but not recommended: "usca". *Corresponds to the valid
Unicode subdivision code for California according to
http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
g-sequences

 
and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
4. *Recommended:* "gbsct". Corresponds to the valid Unicode
subdivision code for Scotland, and *is* listed in
http://unicode.org/Public/emoji/5.0/
.

  As Ken says, the terminology is a little bit in flux for term
 'recommended'. TR51 is still open for comment, although we won't make any
 changes that would invalidate http://unicode.org/Public/emoji/5.0/.

>>>
>>> Just two remarks
>>>
>>> 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode
>>> site) arises something like chicken-egg problem. Vendors don't easily add
>>> new subdivision-flags (because they aren't recommended), and Unicode
>>> doesn't recommend new subdivision flags (because they aren't supported by
>>> vendors).
>>>
>>> 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
>>> valid, but not recommended, Unicode subdivisions codes eligible? For
>>> instances, say, could someone adopt California, Texas, Pomerania, or
>>> Catalonia flags?
>>>
>>>
>>> Regards,
>>> Joan Montané
>>>
>>>
>>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️

Thanks. Probably best as:

unicode_locale_id = unicode_language_id
( transformed_extensions unicode_locale_extensions?
| unicode_locale_extensions transformed_extensions? )?
;

even clearer would be two steps:

unicode_locale_id = unicode_language_id extensions? ;

extensions= transformed_extensions unicode_locale_extensions?
  | unicode_locale_extensions transformed_extensions? ;

Could you file a CLDR ticket on this?


Mark

On Tue, Mar 28, 2017 at 12:36 PM, Philippe Verdy  wrote:

> I note this in TR32
> *3.2 Unicode Locale Identifier
> *
>
> EBNF
> ABNF
>
> unicode_locale_id
>  =
> unicode_language_id
>   (transformed_extensions
>   unicode_locale_extensions?
> | unicode_locale_extensions?
>   transformed_extensions?) ; = unicode_language_id
>   ([trasformed_extensions
>   [unicode_locale_extensions]]
> / [unicode_locale_extensions
>   [transformed_extensions]])
>
> * first there's a typo in the ABNF syntax ("trasformed")
> * the syntax is not strictly equivalent, or the ABNF is unnecessarily not
> context-free
>
> It should better be:
>
> EBNF
> ABNF
>
> unicode_locale_id
>  =
> unicode_language_id
>  (transformed_extensions
>   unicode_locale_extensions?
> | unicode_locale_extensions
>   transformed_extensions?)?; = unicode_language_id
>  [transformed_extensions
>   [unicode_locale_extensions]
> / unicode_locale_extensions
>   [transformed_extensions]]
>
>
>
> 2017-03-28 11:56 GMT+02:00 Joan Montané :
>
>>
>>
>> 2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :
>>
>>> To add to what Ken and Markus said: like many other identifiers, there
>>> are a number of different categories.
>>>
>>>1. *Ill-formed: *"$1"
>>>2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
>>>http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
>>>,
>>>but is not *valid* according to http://unicode.org/reports/tr5
>>>1/proposed.html#valid-emoji-tag-sequences
>>>
>>>.
>>>3. *Valid, but not recommended: "usca". *Corresponds to the valid
>>>Unicode subdivision code for California according to
>>>http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
>>>g-sequences
>>>
>>>and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>>>4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>>>subdivision code for Scotland, and *is* listed in
>>>http://unicode.org/Public/emoji/5.0/
>>>.
>>>
>>>  As Ken says, the terminology is a little bit in flux for term
>>> 'recommended'. TR51 is still open for comment, although we won't make any
>>> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>>>
>>
>> Just two remarks
>>
>> 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
>> arises something like chicken-egg problem. Vendors don't easily add new
>> subdivision-flags (because they aren't recommended), and Unicode doesn't
>> recommend new subdivision flags (because they aren't supported by vendors).
>>
>> 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
>> valid, but not recommended, Unicode subdivisions codes eligible? For
>> instances, say, could someone adopt California, Texas, Pomerania, or
>> Catalonia flags?
>>
>>
>> Regards,
>> Joan Montané
>>
>>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Philippe Verdy

I note this in TR32
*3.2 Unicode Locale Identifier
*

EBNF
ABNF

unicode_locale_id
 =
unicode_language_id
  (transformed_extensions
  unicode_locale_extensions?
| unicode_locale_extensions?
  transformed_extensions?) ; = unicode_language_id
  ([trasformed_extensions
  [unicode_locale_extensions]]
/ [unicode_locale_extensions
  [transformed_extensions]])

* first there's a typo in the ABNF syntax ("trasformed")
* the syntax is not strictly equivalent, or the ABNF is unnecessarily not
context-free

It should better be:

EBNF
ABNF

unicode_locale_id
 =
unicode_language_id
 (transformed_extensions
  unicode_locale_extensions?
| unicode_locale_extensions
  transformed_extensions?)?; = unicode_language_id
 [transformed_extensions
  [unicode_locale_extensions]
/ unicode_locale_extensions
  [transformed_extensions]]



2017-03-28 11:56 GMT+02:00 Joan Montané :

>
>
> 2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :
>
>> To add to what Ken and Markus said: like many other identifiers, there
>> are a number of different categories.
>>
>>1. *Ill-formed: *"$1"
>>2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
>>http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
>>,
>>but is not *valid* according to http://unicode.org/reports/tr5
>>1/proposed.html#valid-emoji-tag-sequences
>>
>>.
>>3. *Valid, but not recommended: "usca". *Corresponds to the valid
>>Unicode subdivision code for California according to
>>http://unicode.org/reports/tr51/proposed.html#valid-emoji-ta
>>g-sequences
>>
>>and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>>4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>>subdivision code for Scotland, and *is* listed in
>>http://unicode.org/Public/emoji/5.0/
>>.
>>
>>  As Ken says, the terminology is a little bit in flux for term
>> 'recommended'. TR51 is still open for comment, although we won't make any
>> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>>
>
> Just two remarks
>
> 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
> arises something like chicken-egg problem. Vendors don't easily add new
> subdivision-flags (because they aren't recommended), and Unicode doesn't
> recommend new subdivision flags (because they aren't supported by vendors).
>
> 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
> valid, but not recommended, Unicode subdivisions codes eligible? For
> instances, say, could someone adopt California, Texas, Pomerania, or
> Catalonia flags?
>
>
> Regards,
> Joan Montané
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️

Good questions.

On Tue, Mar 28, 2017 at 11:56 AM, Joan Montané  wrote:

> 1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
> arises something like chicken-egg problem. Vendors don't easily add new
> subdivision-flags (because they aren't recommended), and Unicode doesn't
> recommend new subdivision flags (because they aren't supported by vendors).
>

That isn't really the case. In particular, vendors can propose adding
additional subdivisions to the recommended list. The UTC Considerations
 would
come into play in assessing those proposals. So it is certainly possible
for there to be (say) a flag of Texas or Catalonia appearing in an Emoji
6.0 release this year. Similarly, Microsoft could propose adding the ninja
cat ZWJ sequences.

> 2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
> valid, but not recommended, Unicode subdivisions codes eligible? For
> instances, say, could someone adopt California, Texas, Pomerania, or
> Catalonia flags?
>

We only support the recommended list for adoptions.

Mark

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Joan Montané

2017-03-28 7:57 GMT+02:00 Mark Davis ☕️ :

> To add to what Ken and Markus said: like many other identifiers, there are
> a number of different categories.
>
>1. *Ill-formed: *"$1"
>2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
>http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence
>,
>but is not *valid* according to http://unicode.org/reports/tr5
>1/proposed.html#valid-emoji-tag-sequences
>
>.
>3. *Valid, but not recommended: "usca". *Corresponds to the valid
>Unicode subdivision code for California according to
>http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
>
>and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
>4. *Recommended:* "gbsct". Corresponds to the valid Unicode
>subdivision code for Scotland, and *is* listed in
>http://unicode.org/Public/emoji/5.0/
>.
>
>  As Ken says, the terminology is a little bit in flux for term
> 'recommended'. TR51 is still open for comment, although we won't make any
> changes that would invalidate http://unicode.org/Public/emoji/5.0/.
>

Just two remarks

1st one: point 4 (Unicode subdivision codes listed in emoji Unicode site)
arises something like chicken-egg problem. Vendors don't easily add new
subdivision-flags (because they aren't recommended), and Unicode doesn't
recommend new subdivision flags (because they aren't supported by vendors).

2n one: What about "Adopt a Character" (AKA "Adopt an emoji"). Will be
valid, but not recommended, Unicode subdivisions codes eligible? For
instances, say, could someone adopt California, Texas, Pomerania, or
Catalonia flags?


Regards,
Joan Montané

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️

(I'm sure you know this, Philippe, but a reminder for others: as far as the
Unicode projects go, discussions on this list have no effect unless they
are turned into a submission (UTC or Emoji proposal, CLDR or ICU ticket).)

If you see any problems in the CLDR data, please file a ticket at
http://unicode.org/cldr/trac/newticket. Please only include the problem
cases. (Note that it is *not* a goal for CLDR to include all ISO
subdivisions going back in time; just back to 2015-09. And even there, if
an ISO subdivision is introduced after the start of a CLDR version, but
retracted before that version releases, it won't be included. If retracted
in a later version, it is moved to the deprecated set.)

Mark

2017-03-28 3:38 GMT+02:00 Philippe Verdy :

> I try to summarize the situation for France, There are some missing codes
>
>  France métropolitaine (deprecated: [fx]):
>Départements métropolitains:
>  [fr01~19 fr2a~b fr21~68 fr70-95] (unchanged)
>  [fr6d]  Rhône (département)  (missing, included
> in [fr69]?)
>Statuts particuliers:
>  [fr69]  Rhône (circonscription départementale)
>  [fr6m]  Métropole de Lyon(missing, included
> in [fr69]?)
>Régions métropolitaines:
>  [frara] Auvergne-Rhône-Alpes (new)
>   - Auvergne  (former)(deprecated: [frc])
>   - Rhône-Alpes   (former)(deprecated: [frv])
>  [frbfc] Bourgogne-Franche-Comté  (new)
>   - Bourgogne (former)(deprecated: [frd])
>   - Franche-Comté (former)(deprecated: [fri])
>  [frbre] Bretagne (unchanged) (deprecated: [fre])
>  [frcor] Corse (collectivité territoriale de) (deprecated: [frh])
>  [frcvl] Centre-Val de Loire  (deprecated: [frf])
>  [frges] Grand-Est(new)
>   - Alsace(former)(deprecated: [fra])
>   - Champagne-Ardenne (former)(deprecated: [frg])
>   - Franche-Comté (former)(deprecated: [frm])
>  [frhdf] Hauts-de-France  (new)
>   - Nord-Pas-de-Calais(former)(deprecated: [fro])
>   - Picardie  (former)(deprecated: [frs])
>  [fridf] Île-de-France(deprecated: [frj])
>  [frnaq] Nouvelle-Aquitaine   (new)
>   - Aquitaine (former)(deprecated: [frb])
>   - Limousin  (former)(deprecated: [frl)
>   - Poitou-Charentes  (former)(deprecated: [frt])
>  [frnor] Normandie(new)
>   - Basse-Normandie   (former)(deprecated: [frp])
>   - Haute-Normandie   (former)(deprecated: [frq])
>  [frocc] Occitanie(new)
>   - Languedoc-Roussillon  (former)(deprecated: [frk])
>   - Midi-Pyrénées (former)(deprecated: [frn])
>  [frpac] Provence-Alpes-Cote d'Azur   (deprecated: [fru])
>  [frpdl] Pays de la Loire (deprecated: [frr])
>  Départements/régions d'outre-mer (DOM/ROM):
>  [gp]Guadeloupe (département) (deprecated: [frgp])
>  [frgua] Guadeloupe (région)
>  [mq]Martinique (département) (deprecated: [frmq])
>  [frmar] Martinique (ancienne région) (missing?)
>  [gf]Guyane (département) (deprecated: [frgf])
>  [frguy] Guyane (ancienne région) (missing?)
>  [yt]Mayotte(département) (deprecated: [fryt])
>  [frmay] Mayotte(ancienne collectivité)
>  [re]La Réunion (département) (deprecated: [frre])
>  [frlre] La Réunion (région)
>  Autres outre-mers:
>Collectivités d'outre-mer (COM):
>  [bl] Saint-Barthélemy(deprecated: [frbl])
>  [mf] Saint-Martin (partie française) (deprecated: [frmf])
>  [pf] Polynésie française (deprecated: [frpf])
>  [pm] Saint-Pierre-et-Miquelon(deprecated: [frpm])
>  [tf] Terres australes et antarctiques françaises (deprecated: [frtf])
>  [wf] Wallis-et-Futuna(deprecated: [frwf])
>Statuts particuliers:
>  [nc]  Nouvelle-Calédonie (deprecated: [frnc])
>  [cp]  Clipperton (deprecated: [frcp])
>
>
> 2017-03-28 2:28 GMT+02:00 Markus Scherer :
>
>> On Mon, Mar 27, 2017 at 5:09 PM, Philippe Verdy 
>> wrote:
>>
>>> I followed the links. Check your links, you are referencing the
>>> proposal, and this contradicts the published version 4.0 of TR51. Where is
>>> stability ?
>>>
>>
>> Of course I

Re: Unicode Emoji 5.0 characters now final

2017-03-28 Thread Mark Davis ☕️

To add to what Ken and Markus said: like many other identifiers, there are
a number of different categories.

   1. *Ill-formed: *"$1"
   2. *Well-formed, but not valid: *"usx". Is *syntactic* according to
   http://unicode.org/reports/tr51/proposed.html#def_emoji_tag_sequence,
   but is not *valid* according to
   http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
   .
   3. *Valid, but not recommended: "usca". *Corresponds to the valid
   Unicode subdivision code for California according to
   http://unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
   and CLDR, but is not listed in http://unicode.org/Public/emoji/5.0/.
   4. *Recommended:* "gbsct". Corresponds to the valid Unicode subdivision
   code for Scotland, and *is* listed in
   http://unicode.org/Public/emoji/5.0/.

 As Ken says, the terminology is a little bit in flux for term
'recommended'. TR51 is still open for comment, although we won't make any
changes that would invalidate http://unicode.org/Public/emoji/5.0/.

I would also encourage people to look at the slides on
http://unicode.org/emoji/, together with the speaker notes, since some of
those slides present this very issue. I'm sure the people on this list will
have some useful comments for improvements.

Another item: with Tayfun's help, we updated
http://unicode.org/press/emoji.html. If people have any feedback on other
articles that should be on that list, please let us know...

Mark

Mark

On Tue, Mar 28, 2017 at 2:28 AM, Markus Scherer 
wrote:

> On Mon, Mar 27, 2017 at 5:09 PM, Philippe Verdy 
> wrote:
>
>> I followed the links. Check your links, you are referencing the proposal,
>> and this contradicts the published version 4.0 of TR51. Where is stability ?
>>
>
> Of course I am pointing to the proposal. The version of TR 51 under review
> adds a mechanism that didn't exist before. It's an addition, not a
> contradiction. Once it's there it will be stable.
> markus
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

I try to summarize the situation for France, There are some missing codes

 France métropolitaine (deprecated: [fx]):
   Départements métropolitains:
 [fr01~19 fr2a~b fr21~68 fr70-95] (unchanged)
 [fr6d]  Rhône (département)  (missing, included in
[fr69]?)
   Statuts particuliers:
 [fr69]  Rhône (circonscription départementale)
 [fr6m]  Métropole de Lyon(missing, included in
[fr69]?)
   Régions métropolitaines:
 [frara] Auvergne-Rhône-Alpes (new)
  - Auvergne  (former)(deprecated: [frc])
  - Rhône-Alpes   (former)(deprecated: [frv])
 [frbfc] Bourgogne-Franche-Comté  (new)
  - Bourgogne (former)(deprecated: [frd])
  - Franche-Comté (former)(deprecated: [fri])
 [frbre] Bretagne (unchanged) (deprecated: [fre])
 [frcor] Corse (collectivité territoriale de) (deprecated: [frh])
 [frcvl] Centre-Val de Loire  (deprecated: [frf])
 [frges] Grand-Est(new)
  - Alsace(former)(deprecated: [fra])
  - Champagne-Ardenne (former)(deprecated: [frg])
  - Franche-Comté (former)(deprecated: [frm])
 [frhdf] Hauts-de-France  (new)
  - Nord-Pas-de-Calais(former)(deprecated: [fro])
  - Picardie  (former)(deprecated: [frs])
 [fridf] Île-de-France(deprecated: [frj])
 [frnaq] Nouvelle-Aquitaine   (new)
  - Aquitaine (former)(deprecated: [frb])
  - Limousin  (former)(deprecated: [frl)
  - Poitou-Charentes  (former)(deprecated: [frt])
 [frnor] Normandie(new)
  - Basse-Normandie   (former)(deprecated: [frp])
  - Haute-Normandie   (former)(deprecated: [frq])
 [frocc] Occitanie(new)
  - Languedoc-Roussillon  (former)(deprecated: [frk])
  - Midi-Pyrénées (former)(deprecated: [frn])
 [frpac] Provence-Alpes-Cote d'Azur   (deprecated: [fru])
 [frpdl] Pays de la Loire (deprecated: [frr])
 Départements/régions d'outre-mer (DOM/ROM):
 [gp]Guadeloupe (département) (deprecated: [frgp])
 [frgua] Guadeloupe (région)
 [mq]Martinique (département) (deprecated: [frmq])
 [frmar] Martinique (ancienne région) (missing?)
 [gf]Guyane (département) (deprecated: [frgf])
 [frguy] Guyane (ancienne région) (missing?)
 [yt]Mayotte(département) (deprecated: [fryt])
 [frmay] Mayotte(ancienne collectivité)
 [re]La Réunion (département) (deprecated: [frre])
 [frlre] La Réunion (région)
 Autres outre-mers:
   Collectivités d'outre-mer (COM):
 [bl] Saint-Barthélemy(deprecated: [frbl])
 [mf] Saint-Martin (partie française) (deprecated: [frmf])
 [pf] Polynésie française (deprecated: [frpf])
 [pm] Saint-Pierre-et-Miquelon(deprecated: [frpm])
 [tf] Terres australes et antarctiques françaises (deprecated: [frtf])
 [wf] Wallis-et-Futuna(deprecated: [frwf])
   Statuts particuliers:
 [nc]  Nouvelle-Calédonie (deprecated: [frnc])
 [cp]  Clipperton (deprecated: [frcp])


2017-03-28 2:28 GMT+02:00 Markus Scherer :

> On Mon, Mar 27, 2017 at 5:09 PM, Philippe Verdy 
> wrote:
>
>> I followed the links. Check your links, you are referencing the proposal,
>> and this contradicts the published version 4.0 of TR51. Where is stability ?
>>
>
> Of course I am pointing to the proposal. The version of TR 51 under review
> adds a mechanism that didn't exist before. It's an addition, not a
> contradiction. Once it's there it will be stable.
> markus
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Markus Scherer

On Mon, Mar 27, 2017 at 5:09 PM, Philippe Verdy  wrote:

> I followed the links. Check your links, you are referencing the proposal,
> and this contradicts the published version 4.0 of TR51. Where is stability ?
>

Of course I am pointing to the proposal. The version of TR 51 under review
adds a mechanism that didn't exist before. It's an addition, not a
contradiction. Once it's there it will be stable.
markus

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

I followed the links. Check your links, you are referencing the proposal,
and this contradicts the published version 4.0 of TR51. Where is stability ?

2017-03-28 2:06 GMT+02:00 Markus Scherer :

> On Mon, Mar 27, 2017 at 4:58 PM, Philippe Verdy 
> wrote:
>
>> This only describes the sequences encoded with 2 characters, not the
>> newer longer sequences for flags of subnational regions. the
>> unicode_region_subtag data does not contain anything about the flags for
>> the first 3 regions in GB.
>>
>
> Please read again what I quoted, and do follow the links.
> markus
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

Also these yellow statements from the initial proposal are contradicting
what is now published in TR51: "UN" and "EU" are accepted even if they are
"macroregions", not satisfying the quoted condition 2 in the proposed
update.

2017-03-28 1:58 GMT+02:00 Philippe Verdy :

> This only describes the sequences encoded with 2 characters, not the newer
> longer sequences for flags of subnational regions. the
> unicode_region_subtag data does not contain anything about the flags for
> the first 3 regions in GB.
>
> 2017-03-28 1:35 GMT+02:00 Markus Scherer :
>
>> On Mon, Mar 27, 2017 at 1:39 PM, Philippe Verdy 
>> wrote:
>>
>>> Note also that ISO3166-2 is far from being stable, and this could
>>> contradict Unicode encoding stability: it would then be required to ensure
>>> this stability by only allowing sequences that are effectively registered
>>> in http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt
>>> (independantly of the registration ins ISO3166-2), and nothing is said if
>>> ever ISO3166-2 obsoletes some codes and then some years later decide to
>>> reassign these codes to new entities: it should not be possible to do the
>>> same thing in Emoji sequences, and specific assignments will need to be
>>> made in the Unicode database.
>>>
>>
>> The emoji sequences are stable. Please read
>> http://www.unicode.org/reports/tr51/proposed.html#valid-
>> emoji-tag-sequences and follow the links to the CLDR spec and data.
>>
>> Let SD be the result of mapping each character in the tag_spec to a
>> character in [0-9a-z] by subtracting 0xE.
>>
>>
>>1. SD must then be a specification as per [CLDR
>>   ] of
>>   either a Unicode subdivision_id
>>   
>>(data
>>   
>> )
>>   or a 3-digit unicode_region_subtag
>>   
>>   (data
>>   
>> ),
>>   and
>>   2. SD must have CLDR idStatus equal to "regular" or "deprecated".
>>
>>
>> markus
>>
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Markus Scherer

On Mon, Mar 27, 2017 at 4:58 PM, Philippe Verdy  wrote:

> This only describes the sequences encoded with 2 characters, not the newer
> longer sequences for flags of subnational regions. the
> unicode_region_subtag data does not contain anything about the flags for
> the first 3 regions in GB.
>

Please read again what I quoted, and do follow the links.
markus

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

This only describes the sequences encoded with 2 characters, not the newer
longer sequences for flags of subnational regions. the
unicode_region_subtag data does not contain anything about the flags for
the first 3 regions in GB.

2017-03-28 1:35 GMT+02:00 Markus Scherer :

> On Mon, Mar 27, 2017 at 1:39 PM, Philippe Verdy 
> wrote:
>
>> Note also that ISO3166-2 is far from being stable, and this could
>> contradict Unicode encoding stability: it would then be required to ensure
>> this stability by only allowing sequences that are effectively registered
>> in http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt
>> (independantly of the registration ins ISO3166-2), and nothing is said if
>> ever ISO3166-2 obsoletes some codes and then some years later decide to
>> reassign these codes to new entities: it should not be possible to do the
>> same thing in Emoji sequences, and specific assignments will need to be
>> made in the Unicode database.
>>
>
> The emoji sequences are stable. Please read http://www.unicode.org/
> reports/tr51/proposed.html#valid-emoji-tag-sequences and follow the links
> to the CLDR spec and data.
>
> Let SD be the result of mapping each character in the tag_spec to a
> character in [0-9a-z] by subtracting 0xE.
>
>
>1. SD must then be a specification as per [CLDR
>   ] of either
>   a Unicode subdivision_id
>   
>(data
>   
> )
>   or a 3-digit unicode_region_subtag
>    (
>   data
>   
> ),
>   and
>   2. SD must have CLDR idStatus equal to "regular" or "deprecated".
>
>
> markus
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Markus Scherer

On Mon, Mar 27, 2017 at 1:34 PM, Ken Whistler  wrote:

> Anybody could *attempt* to convey a flag of Pomerania (a rather handsome
> black gryphon on a yellow background, btw) with an emoji tag sequence right
> now, I suppose.

I suppose not. Since it's bound to ISO 3166 subdivision codes (possibly
with CLDR additions), it would have to be "demv" for
https://en.wikipedia.org/wiki/Mecklenburg-Vorpommern or codes for adjacent
regions in Poland.

markus

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Markus Scherer

On Mon, Mar 27, 2017 at 1:39 PM, Philippe Verdy  wrote:

> Note also that ISO3166-2 is far from being stable, and this could
> contradict Unicode encoding stability: it would then be required to ensure
> this stability by only allowing sequences that are effectively registered
> in http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt
> (independantly of the registration ins ISO3166-2), and nothing is said if
> ever ISO3166-2 obsoletes some codes and then some years later decide to
> reassign these codes to new entities: it should not be possible to do the
> same thing in Emoji sequences, and specific assignments will need to be
> made in the Unicode database.
>

The emoji sequences are stable. Please read
http://www.unicode.org/reports/tr51/proposed.html#valid-emoji-tag-sequences
and follow the links to the CLDR spec and data.

Let SD be the result of mapping each character in the tag_spec to a
character in [0-9a-z] by subtracting 0xE.

   1. SD must then be a specification as per [CLDR
  ] of either a
  Unicode subdivision_id
   (
  data

)
  or a 3-digit unicode_region_subtag
   (
  data

),
  and
  2. SD must have CLDR idStatus equal to "regular" or "deprecated".

markus

RE: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Doug Ewell

Philippe Verdy wrote:

> So it's up to the UTC to create this encoding: this new relase is a
> start for a new vexillology registry (within encoded sequences) which
> creates a new standard for them.

Fine. If you think you can persuade UTC that this is within their scope,
go ahead. Let us know how that works out.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Doug Ewell

Ken Whistler wrote:

> By the way, if anybody is looking, Pomerania is there: "plpm" among
> the 4925 other valid unicode_subdivision_id values. So:
>
> Flag of Pomerania = 1F3F4 E0070 E006C E0070 E006D E007F
>
> But alas, that is not a *valid* emoji tag sequence (yet), so no soup
> for you!

This is a major letdown, after almost two years following the progress
of flag tag sequences, to find that the arguments that "these three
flags are special because they appear in international sports" have won
the day and the others are demoted to "non-standard." That was never
implied in any of the published UTC documents before.

I've collected well over 800 subdivision flags, and I'm sure there are
hundreds more, each with its own proud constituency. Vendors don't want
to bother adding a glyph for Saskatchewan or Neuquén or Yamagata? They
don't have to; they never had to. But now they're essentially being told
not to.

This was the only aspect of emoji I had the slightest interest in. Boo.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

So it's up to the UTC to create this encoding: this new relase is a start
for a new vexillology registry (within encoded sequences) which creates a
new standard for them.

2017-03-27 23:50 GMT+02:00 Doug Ewell :

> Philippe Verdy wrote:
>
> > We still lack an encoding standard for vexillologists. And for now
> > only "Flags of the World" proposes some encoding (not based strictly
> > and only on ISO3166). I think that the UTC should try contacting
> > authors of Flags of the World and seek for advice there: we are
> > speaking here about regional flags (we can exclude some graphical
> > variants such as civil vs. navy flags vs honorific flags)
>
> As Philippe knows, because he and I had this discussion in 2012 and
> again in 2013:
>
> - I have already contacted FOTW.
> - They have no such encoding, except 3166-1 for countries and the 2-by-3
>   information code, and they have never proposed one.
> - I think such a standard would be a great idea, but
> - I don't think this is any of UTC's business and I'll bet they agree.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Peter Edberg

(this time from the correct account)

Philippe and others,
http://www.unicode.org/reports/tr51/tr51-11.html#valid-emoji-tag-sequences 
 
refers to
CLDR data for the list of valid subregion sequences, see
http://unicode.org/reports/tr35/index.html#Validity 


CLDR data will maintain stable sequences in the event that ISO 3166-2 data 
changes.

- Peter E

> On Mar 27, 2017, at 1:39 PM, Philippe Verdy  > wrote:
> 
> Note also that ISO3166-2 is far from being stable, and this could contradict 
> Unicode encoding stability: it would then be required to ensure this 
> stability by only allowing sequences that are effectively registered in 
> http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt 
>  (independantly 
> of the registration ins ISO3166-2), and nothing is said if ever ISO3166-2 
> obsoletes some codes and then some years later decide to reassign these codes 
> to new entities: it should not be possible to do the same thing in Emoji 
> sequences, and specific assignments will need to be made in the Unicode 
> database.
> 
> Note also that most rencetly created administrative divisions do not really 
> adopt any flag, but if flags are used they may be reusing flags from older 
> historic entities... or they could adopt only a logo (with legal protection, 
> not really suitable from encoding in the UCS as it won't be possible to 
> define any "representative glyph" without asking for permission to the 
> relevant authorities for displaying some design, possibly simplified)
> 
> We still lack an encoding standard for vexillologists. And for now only 
> "Flags of the World" proposes some encoding (not based strictly and only on 
> ISO3166). I think that the UTC should try contacting authors of Flags of the 
> World and seek for advice there: we are speaking here about regional flags 
> (we can exclude some graphical variants such as civil vs. navy flags vs 
> honorific flags)
> 
> 
> 2017-03-27 22:30 GMT+02:00 Philippe Verdy  >:
> 
> 
> 2017-03-27 21:17 GMT+02:00 Doug Ewell  >:
> announcements at Unicode dot org wrote:
> 
> > — and new regional flags for England, Scotland, and Wales.
> 
> It's not clear from this text, nor from the table in Section C.1.1 of
> the draft, what the status is of flag emoji tag sequences other than the
> three above.
> 
> Right, we've got them encoded as [GBENG], [GBSCT] and [GBWLS], but the codes 
> used do not specify clearly about which region code standard they are 
> refering to. We just see that it's an ISO3166-1 country/territory code 
> followed directly (without separator) by sequences of letter/digits, all of 
> them converted to RIS and surrounded by a the same initial emeoji code and 
> the DEL from RIS.
> 
> The problem is how to choose the codes for the letter/digits in the second 
> part, if they ever come from ISO3166-2 after dropping the hypen separator 
> (this is the case here, see https://en.wikipedia.org/wiki/ISO_3166-2:GB 
> ) or somewhere else.
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

And the new region of Normandie still has no formal code, but it reuses a
flag that was used by one of the two former regions.
Technically I don't see that as a problem except that people may want to
display that flag using the code for the former region and semantically
this is different (and also different from the former Duchy before it was
partly annexed by France and left the Channel Islands to the new English
Crown in the Middle Age.
If we are concerned only by encoding modern entities, anyway if these
sequences are encoded, there will be nobody to restrict their reuse for
past entities (jsut kike Unicode cannot rule against the use of a capital
Greek Alpha replacing a Capital Latin A, or the fancy use of Latin for
"ASCII art", as Unicode does not encode orthographies or languages).
Once a sequence is registered, even if it is intended to represent a modern
entity, anyone will be using them as they want. This gives also a hint
about why encoding stability will be important. But as we know, the
regional or national entities are changing their flags and sometimes
reusing former flags from other entities. Sooner or later, there will be
confusion.
I would suggest that if renderers have the capability of rendering colorful
flags and provide an UI, at least they should be also rendering some hints,
notably the underlying code or a name if available, using for example
mousehover events to explain these flags and their intended usage: if a
former flag is reused by another entity, that new entity should have its
own encoding and the former flags should not be affected (its displayed
hint should still indicate a reference to their former meaning).

2017-03-27 23:32 GMT+02:00 Richard Wordingham <
richard.wording...@ntlworld.com>:

> On Mon, 27 Mar 2017 13:34:09 -0700
> Ken Whistler  wrote:
>
> > And if a flag of
> > California (or Pomerania or ...) then gets added to the list of emoji
> > tag sequences in a future version of the data, there is a good chance
> > that the "users" will then see the difference, because that flag will
> > appear on their phones eventually.
>
> Indeed, why isn't the flag of Texas there already so as to terminate
> the abuse of .  Technically, at least, it has the
> justification of being a formerly independent country, though I don't
> know that they have any national teams.
>
> Is anyone working on the issue of flags for the whole of Ireland?
> Different sports have their own 'national' flags.
>
> Pomerania will be a bit tricky, as it isn't any recent administrative
> division.
>
> Richard.
>

RE: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Doug Ewell

Philippe Verdy wrote:

> We still lack an encoding standard for vexillologists. And for now
> only "Flags of the World" proposes some encoding (not based strictly
> and only on ISO3166). I think that the UTC should try contacting
> authors of Flags of the World and seek for advice there: we are
> speaking here about regional flags (we can exclude some graphical
> variants such as civil vs. navy flags vs honorific flags)

As Philippe knows, because he and I had this discussion in 2012 and
again in 2013:

- I have already contacted FOTW.
- They have no such encoding, except 3166-1 for countries and the 2-by-3
  information code, and they have never proposed one.
- I think such a standard would be a great idea, but
- I don't think this is any of UTC's business and I'll bet they agree.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org

RE: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Doug Ewell

Ken Whistler wrote:

> As for how "users" are supposed to know the difference. Well, they
> don't. What matters is that the data file that the "implementers" will
> use has these 3 emoji tag sequences in it, so that is quite likely
> what everybody will see added to their phones. The "users" will just
> see 3 more flags.

So, no provision for a UI like the one I'm building, to let users select
a region or subdivision and generate the corresponding sequence? Mmh.
Well, anyway.

> And if they want a flag of California (or whatever), then they need to
> badger the platform vendors, who will then come back to the Emoji SC,
> saying, "Help! We need to add a flag of California, or people won't
> buy our phones!"

The way nobody will buy their phones unless they support all 5 skin
tones for all 3 flavors of "vampire" or "elf" or "fairy" or "person in
lotus position"? Those are also generative mechanisms, but not limited
to just a couple of combinations deemed worthy.

If flags have to be added one by one, a lot of them (including the
really useful ones, like California and Bavaria) will probably never
happen.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Richard Wordingham

On Mon, 27 Mar 2017 13:34:09 -0700
Ken Whistler  wrote:

> And if a flag of
> California (or Pomerania or ...) then gets added to the list of emoji
> tag sequences in a future version of the data, there is a good chance
> that the "users" will then see the difference, because that flag will
> appear on their phones eventually.

Indeed, why isn't the flag of Texas there already so as to terminate
the abuse of .  Technically, at least, it has the
justification of being a formerly independent country, though I don't
know that they have any national teams.

Is anyone working on the issue of flags for the whole of Ireland?
Different sports have their own 'national' flags.

Pomerania will be a bit tricky, as it isn't any recent administrative
division.

Richard.

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Ken Whistler




On 3/27/2017 1:39 PM, Philippe Verdy wrote:
Note also that ISO3166-2 is far from being stable, and this could 
contradict Unicode encoding stability: it would then be required to 
ensure this stability by only allowing sequences that are effectively 
registered in 
http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt 
(independantly of the registration ins ISO3166-2), and nothing is said 
if ever ISO3166-2 obsoletes some codes and then some years later 
decide to reassign these codes to new entities: it should not be 
possible to do the same thing in Emoji sequences, and specific 
assignments will need to be made in the Unicode database.




These emoji tag sequences don't derive their stability from ISO 3166-2.

The emoji tag sequences depend on: CLDR Unicode Locale Identifiers, and 
more specifically, for these subregions, on the unicode_subdivision_id:


http://unicode.org/reports/tr35/index.html#unicode_subdivision_id

And the data for that is here:

http://unicode.org/repos/cldr/tags/latest/common/validity/subdivision.xml

The stability for such tags is baked into the CLDR repository, as I 
understand it.


By the way, if anybody is looking, Pomerania is there: "plpm" among the 
4925 other valid unicode_subdivision_id values. So:


Flag of Pomerania = 1F3F4 E0070 E006C E0070 E006D E007F

But alas, that is not a *valid*  emoji tag sequence (yet), so no soup 
for you!


--Ken

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

Note also that ISO3166-2 is far from being stable, and this could
contradict Unicode encoding stability: it would then be required to ensure
this stability by only allowing sequences that are effectively registered
in http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt
(independantly of the registration ins ISO3166-2), and nothing is said if
ever ISO3166-2 obsoletes some codes and then some years later decide to
reassign these codes to new entities: it should not be possible to do the
same thing in Emoji sequences, and specific assignments will need to be
made in the Unicode database.

Note also that most rencetly created administrative divisions do not really
adopt any flag, but if flags are used they may be reusing flags from older
historic entities... or they could adopt only a logo (with legal
protection, not really suitable from encoding in the UCS as it won't be
possible to define any "representative glyph" without asking for permission
to the relevant authorities for displaying some design, possibly simplified)

We still lack an encoding standard for vexillologists. And for now only
"Flags of the World" proposes some encoding (not based strictly and only on
ISO3166). I think that the UTC should try contacting authors of Flags of
the World and seek for advice there: we are speaking here about regional
flags (we can exclude some graphical variants such as civil vs. navy flags
vs honorific flags)

2017-03-27 22:30 GMT+02:00 Philippe Verdy :

>
>
> 2017-03-27 21:17 GMT+02:00 Doug Ewell :
>
>> announcements at Unicode dot org wrote:
>>
>> > — and new regional flags for England, Scotland, and Wales.
>>
>> It's not clear from this text, nor from the table in Section C.1.1 of
>> the draft, what the status is of flag emoji tag sequences other than the
>> three above.
>>
>
> Right, we've got them encoded as [GBENG], [GBSCT] and [GBWLS], but the
> codes used do not specify clearly about which region code standard they are
> refering to. We just see that it's an ISO3166-1 country/territory code
> followed directly (without separator) by sequences of letter/digits, all of
> them converted to RIS and surrounded by a the same initial emeoji code and
> the DEL from RIS.
>
> The problem is how to choose the codes for the letter/digits in the second
> part, if they ever come from ISO3166-2 after dropping the hypen separator
> (this is the case here, see https://en.wikipedia.org/wiki/ISO_3166-2:GB)
> or somewhere else.
>

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Ken Whistler



On 3/27/2017 12:17 PM, Doug Ewell wrote:

announcements at Unicode dot org wrote:


— and new regional flags for England, Scotland, and Wales.

It's not clear from this text, nor from the table in Section C.1.1 of
the draft, what the status is of flag emoji tag sequences other than the
three above.

I read the relevant section a couple of times and could not figure out
how a "standard sequence" differs from a non-standard one, or how
ordinary users are supposed to know the difference. The term "standard
sequence" appears nowhere in the draft except as a table header.


The terminology is still a bit in flux, which is why the text of UTS #51 
is still under review, before being finalized at the UTC meeting in May.


But the data for Emoji 5.0 is final, and there are precisely 3 "emoji 
tag sequences" in the relevant data file:


http://www.unicode.org/Public/emoji/5.0/emoji-sequences.txt

As for how "users" are supposed to know the difference. Well, they 
don't. What matters is that the data file that the "implementers" will 
use has these 3 emoji tag sequences in it, so that is quite likely what 
everybody will see added to their phones. The "users" will just see 3 
more flags. And if they want a flag of California (or whatever), then 
they need to badger the platform vendors, who will then come back to the 
Emoji SC, saying, "Help! We need to add a flag of California, or people 
won't buy our phones!" And if a flag of California (or Pomerania or ...) 
then gets added to the list of emoji tag sequences in a future version 
of the data, there is a good chance that the "users" will then see the 
difference, because that flag will appear on their phones eventually.


Anybody could *attempt* to convey a flag of Pomerania (a rather handsome 
black gryphon on a yellow background, btw) with an emoji tag sequence 
right now, I suppose. Good luck on any input support or actual 
interoperability or availability in any font on any standard platform, 
however. You'd just get fallback display. If conveying flags of 
Pomerania is in your near term future, I'd advise sticking to images. ;-)


--Ken



Vendors always have the option of supporting or not supporting a glyph
for any code point or sequence -- note 4 in Section C.1 and the second
sentence in C.1.1 both reinforce this long-standing principle -- so
there must be something more here.

Re: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Philippe Verdy

2017-03-27 21:17 GMT+02:00 Doug Ewell :

> announcements at Unicode dot org wrote:
>
> > — and new regional flags for England, Scotland, and Wales.
>
> It's not clear from this text, nor from the table in Section C.1.1 of
> the draft, what the status is of flag emoji tag sequences other than the
> three above.
>

Right, we've got them encoded as [GBENG], [GBSCT] and [GBWLS], but the
codes used do not specify clearly about which region code standard they are
refering to. We just see that it's an ISO3166-1 country/territory code
followed directly (without separator) by sequences of letter/digits, all of
them converted to RIS and surrounded by a the same initial emeoji code and
the DEL from RIS.

The problem is how to choose the codes for the letter/digits in the second
part, if they ever come from ISO3166-2 after dropping the hypen separator
(this is the case here, see https://en.wikipedia.org/wiki/ISO_3166-2:GB) or
somewhere else.

RE: Unicode Emoji 5.0 characters now final

2017-03-27 Thread Doug Ewell

announcements at Unicode dot org wrote:

> — and new regional flags for England, Scotland, and Wales.

It's not clear from this text, nor from the table in Section C.1.1 of
the draft, what the status is of flag emoji tag sequences other than the
three above.

I read the relevant section a couple of times and could not figure out
how a "standard sequence" differs from a non-standard one, or how
ordinary users are supposed to know the difference. The term "standard
sequence" appears nowhere in the draft except as a table header.

Vendors always have the option of supporting or not supporting a glyph
for any code point or sequence -- note 4 in Section C.1 and the second
sentence in C.1.1 both reinforce this long-standing principle -- so
there must be something more here.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org 

74 matches

Mail list logo