Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))
On January 5, Mark Davis wrote: Doug, I modified my working draft, at https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY If that looks ok, I'll submit. Sorry for the delay. The text substitutions look fine. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))
Doug, I modified my working draft, at https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY If that looks ok, I'll submit. Thanks again for your comments. Mark Mark On Wed, Jan 3, 2018 at 9:29 AM, Mark Davis ☕️wrote: > Thanks for your comments; you raise an excellent issue. There are valid > sequences that are not RGI; a vendor can support additional emoji sequences > (in particular, flags). So the wording in the doc isn't correct. > > It should do something like replace the use of "testing for RGI" by > "testing for validity". The key areas involved in that are checking for the > valid base+modifier combinations, valid RI pairs, and TAG sequences. The > latter two involve testing based on the information applied in the > appendix, while the valid base+modifiers are more regular and can be tested > based on properties. > > > Mark > > On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicode < > unicode@unicode.org> wrote: > >> Mark Davis wrote: >> >> BTW, relevant to this discussion is a proposal filed >>> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The >>> date is wrong, should be 2017-12-22) >>> >> >> The phrase "emoji regex" had caused me to ignore this document, but I >> took a look based on this thread. It says "we still depend on the RGI test >> to filter the set of emoji sequences" and proposes that the EBNF in UTS #51 >> be simplified on the basis that only RGI sequences will pass the "possible >> emoji" test anyway. >> >> Thus it is true, as some people have said (i.e. in L2/17‐382), that >> non-RGI sequences do not actually count as emoji, and therefore there is no >> way — not merely no "recommended" way — to represent the flags of entities >> such as Catalonia and Brittany. >> >> In 2016 I had asked for the emoji tag sequence mechanism for flags to be >> available for all CLDR subdivisions, not just three, with the understanding >> that the vast majority would not be supported by vendor glyphs. II t is >> unfortunate that, while the conciliatory name "recommended" was adopted for >> the three, the intent of "exclusively permitted" was retained. >> >> -- >> Doug Ewell | Thornton, CO, US | ewellic.org >> >> >
Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))
Thanks for your comments; you raise an excellent issue. There are valid sequences that are not RGI; a vendor can support additional emoji sequences (in particular, flags). So the wording in the doc isn't correct. It should do something like replace the use of "testing for RGI" by "testing for validity". The key areas involved in that are checking for the valid base+modifier combinations, valid RI pairs, and TAG sequences. The latter two involve testing based on the information applied in the appendix, while the valid base+modifiers are more regular and can be tested based on properties. Mark On Tue, Jan 2, 2018 at 9:55 PM, Doug Ewell via Unicodewrote: > Mark Davis wrote: > > BTW, relevant to this discussion is a proposal filed >> http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The >> date is wrong, should be 2017-12-22) >> > > The phrase "emoji regex" had caused me to ignore this document, but I took > a look based on this thread. It says "we still depend on the RGI test to > filter the set of emoji sequences" and proposes that the EBNF in UTS #51 be > simplified on the basis that only RGI sequences will pass the "possible > emoji" test anyway. > > Thus it is true, as some people have said (i.e. in L2/17‐382), that > non-RGI sequences do not actually count as emoji, and therefore there is no > way — not merely no "recommended" way — to represent the flags of entities > such as Catalonia and Brittany. > > In 2016 I had asked for the emoji tag sequence mechanism for flags to be > available for all CLDR subdivisions, not just three, with the understanding > that the vast majority would not be supported by vendor glyphs. II t is > unfortunate that, while the conciliatory name "recommended" was adopted for > the three, the intent of "exclusively permitted" was retained. > > -- > Doug Ewell | Thornton, CO, US | ewellic.org > >
Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))
Mark Davis wrote: BTW, relevant to this discussion is a proposal filed http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The date is wrong, should be 2017-12-22) The phrase "emoji regex" had caused me to ignore this document, but I took a look based on this thread. It says "we still depend on the RGI test to filter the set of emoji sequences" and proposes that the EBNF in UTS #51 be simplified on the basis that only RGI sequences will pass the "possible emoji" test anyway. Thus it is true, as some people have said (i.e. in L2/17‐382), that non-RGI sequences do not actually count as emoji, and therefore there is no way — not merely no "recommended" way — to represent the flags of entities such as Catalonia and Brittany. In 2016 I had asked for the emoji tag sequence mechanism for flags to be available for all CLDR subdivisions, not just three, with the understanding that the vast majority would not be supported by vendor glyphs. II t is unfortunate that, while the conciliatory name "recommended" was adopted for the three, the intent of "exclusively permitted" was retained. -- Doug Ewell | Thornton, CO, US | ewellic.org