RE: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Shawn Steele via Unicode
Of Henri Sivonen via Unicode Sent: Sunday, August 16, 2020 11:39 PM To: Mark Davis ☕️ Cc: Unicode Public Subject: Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences Sorry about the delay. There is now https://www.unicode.org/L2/L2020/20202-empty-iso-2022-jp.pdf

Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Harriet Riddle via Unicode
the WHATWG logic is to encode the next character in the current codeset if possible, and switch to another if it is not. -- Har From: Unicode on behalf of Henri Sivonen via Unicode Sent: 17 August 2020 08:38 To: Mark Davis ☕️ Cc: Unicode Public Subject: Re: G

Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Henri Sivonen via Unicode
/tr36/? > > Mark > > > On Mon, Dec 10, 2018 at 11:10 AM Henri Sivonen via Unicode > wrote: >> >> We're about to remove the U+FFFD generation for the case where there >> is no content between two ISO-2022-JP escape sequences from the WHATWG >> Encoding

RE: Emoji map of Colorado

2020-04-02 Thread Doug Ewell via Unicode
Karl Williamson shared: > https://www.reddit.com/r/Denver/comments/fsmn87/quarantine_boredom_my_emoji_map_of_colorado/?mc_cid=365e908e08_eid=0700c8706b It's too bad this was only made available as an image, not as text, which of course it is. -- Doug Ewell | Thornton, CO, US | ewellic.org

Emoji map of Colorado

2020-04-01 Thread Karl Williamson via Unicode
https://www.reddit.com/r/Denver/comments/fsmn87/quarantine_boredom_my_emoji_map_of_colorado/?mc_cid=365e908e08_eid=0700c8706b

How is meaning changed by context and typgraphy - in art, emoji and language

2020-04-01 Thread wjgo_10...@btinternet.com via Unicode
I received a circulated email from MoMA, the Museum of Modern Art in New York. I am, at my request, on their mailing list. There is a link to a web page. https://www.moma.org/magazine/articles/257 There is a video embedded in the web page, 8 minutes. I watched the video and found it

Base character plus tag sequences (from RE: Is the binaryness/textness of a data format a property?)

2020-03-23 Thread wjgo_10...@btinternet.com via Unicode
Doug Ewell wrote: When 137,468 private-use characters aren't enough? In my opinion, a base character plus tag sequence has the potential to be used for many large scale applications for the future. A base character plus tag sequence encoding has the advantage over a Private Use Area encoding

Re: Is the binaryness/textness of a data format a property?

2020-03-22 Thread Martin J . Dürst via Unicode
On 23/03/2020 03:56, Markus Scherer via Unicode wrote: > On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode > wrote: > >> I thought the whole premise of GB18030 was that it was Unicode mapped into >> a GB2312 framework. What characters exist in GB18030 that don'

Re: Is the binaryness/textness of a data format a property?

2020-03-22 Thread Markus Scherer via Unicode
On Sat, Mar 21, 2020 at 12:35 PM Doug Ewell via Unicode wrote: > I thought the whole premise of GB18030 was that it was Unicode mapped into > a GB2312 framework. What characters exist in GB18030 that don't exist in > Unicode, and have they been proposed for Unicode yet, and why

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Richard Wordingham via Unicode
On Sat, 21 Mar 2020 13:33:18 -0600 Doug Ewell via Unicode wrote: > Eli Zaretskii wrote: > > Emacs uses some of that for supporting charsets that cannot be > > mapped into Unicode. GB18030 is one example of such charsets. The > > internal representation of characters

RE: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Doug Ewell via Unicode
Eli Zaretskii wrote: >> When 137,468 private-use characters aren't enough? > > Why is that relevant to the issue at hand? You're right. I did ask what the uses of non-standard UTF-8 were, and you gave me an example. > I don't remember off hand, but last time I looked at GB18030, there > were a

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Julian Bradfield via Unicode
On 2020-03-21, Eli Zaretskii via Unicode wrote: >> Date: Sat, 21 Mar 2020 11:13:40 -0600 >> From: Doug Ewell via Unicode >> >> Adam Borowski wrote: >> >> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF >> > or U+11

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Eli Zaretskii via Unicode
> From: "Doug Ewell" > Cc: > Date: Sat, 21 Mar 2020 13:33:18 -0600 > > > Emacs uses some of that for supporting charsets that cannot be mapped > > into Unicode. GB18030 is one example of such charsets. The internal > > representation of characters in Emacs is UTF-8, so it uses 5-byte > >

RE: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Doug Ewell via Unicode
Eli Zaretskii wrote: >>> Also, UTF-8 can carry more than Unicode -- for example, >>> U+D800..U+DFFF or U+11000..U+7FFF (or possibly even up to 2³⁶ or >>> 2⁴²), which has its uses but is not well-formed Unicode. >> >> I'd be interested in your elaboration on what these uses are. > > Emacs uses

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Eli Zaretskii via Unicode
> Date: Sat, 21 Mar 2020 11:13:40 -0600 > From: Doug Ewell via Unicode > > Adam Borowski wrote: > > > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF > > or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has > > its us

Re: Is the binaryness/textness of a data format a property?

2020-03-21 Thread Doug Ewell via Unicode
Adam Borowski wrote: > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF > or U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has > its uses but is not well-formed Unicode. I'd be interested in your elaboration on what these uses are. -- Doug Ewell |

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Martin J . Dürst via Unicode
On 20/03/2020 23:41, Adam Borowski via Unicode wrote: > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF or > U+11000..U+7FFF (or possibly even up to 2³⁶ or 2⁴²), which has its uses > but is not well-formed Unicode. This would definitely no longer be UTF-8! Martin.

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Richard Wordingham via Unicode
On Fri, 20 Mar 2020 13:46:25 +0100 Adam Borowski via Unicode wrote: > On Fri, Mar 20, 2020 at 12:21:26PM +, Costello, Roger L. via > Unicode wrote: > > [Definition] Property: an attribute, quality, or characteristic of > > something. > > > > JPEG is a binary

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Adam Borowski via Unicode
On Fri, Mar 20, 2020 at 07:22:45AM -0700, J Decker via Unicode wrote: > On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode < > > For example, most Unix-heads will tell you that UTF16LE is a binary rather > > than text format. Microsoft employees and some me

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread J Decker via Unicode
On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode < unicode@unicode.org> wrote: > On Fri, Mar 20, 2020 at 12:21:26PM +, Costello, Roger L. via Unicode > wrote: > > [Definition] Property: an attribute, quality, or characteristic of > something. > > >

Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Adam Borowski via Unicode
On Fri, Mar 20, 2020 at 12:21:26PM +, Costello, Roger L. via Unicode wrote: > [Definition] Property: an attribute, quality, or characteristic of something. > > JPEG is a binary data format. > CSV is a text data format. > > Question #1: Is the binaryness/textness of a data

AW: Is the binaryness/textness of a data format a property?

2020-03-20 Thread Dreiheller, Albrecht via Unicode
#1: Yes. #2: [ my suggestion ] File type category A.D. -Ursprüngliche Nachricht- Von: Unicode Im Auftrag von Costello, Roger L. via Unicode Gesendet: Freitag, 20. März 2020 13:21 An: unicode@unicode.org Betreff: Is the binaryness/textness of a data format a property? Hello Data

Is the binaryness/textness of a data format a property?

2020-03-20 Thread Costello, Roger L. via Unicode
Hello Data Format Experts! [Definition] Property: an attribute, quality, or characteristic of something. JPEG is a binary data format. CSV is a text data format. Question #1: Is the binaryness/textness of a data format a property? Question #2: If the answer to Question #1 is yes, then what is

EGYPTIAN HIEROGLYPH MAN WITH A ROLL OF TOILET PAPER

2020-03-11 Thread Karl Williamson via Unicode
On 2/12/20 11:12 AM, Frédéric Grosshans via Unicode wrote: Dear Unicode list members (CC Michel Suignard),   the Unicode proposal L2/20-068 <https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf>, “Revised draft for the encoding of an extended Egyptian Hieroglyphs repe

Re: UAX #29 and WB4

2020-03-09 Thread Andy Heninger via Unicode
it no-break rules on both sides instead. -- Andy On Wed, Mar 4, 2020 at 4:01 PM Mark Davis ☕️ via Unicode < unicode@unicode.org> wrote: > One thing we have considered for a while is whether to do a rewrite of the > rules to simplify the processing (and avoid the "treat as" rul

Reminder about reporting bugs, errors, and other feedback

2020-03-07 Thread Rick McGowan via Unicode
Hello everyone... This is just a little public service reminder that discussions on the Unicode mail list are not considered official feedback, and are not reviewed by UTC members or staff as a source for bug reports. If you want to make sure your feedback and/or report gets into the UTC

UAX #29 6.2

2020-03-07 Thread Zack Newman via Unicode
According to 6.2, "thus ignoring Extend is sufficient to disallow breaking within a grapheme cluster." However the sequence of Unicode scalar values (U+0600, U+0020) is considered a single grapheme cluster due to rule GB9, but the sequence is parsed into two words according to 4.1.1. While it

Re: UAX #29 and WB4

2020-03-04 Thread Mark Davis ☕️ via Unicode
et us know. Mark On Wed, Mar 4, 2020 at 11:30 AM Daniel Bünzli via Unicode < unicode@unicode.org> wrote: > On 4 March 2020 at 18:48:09, Daniel Bünzli (daniel.buen...@erratique.ch) > wrote: > > > On 4 March 2020 at 18:01:25, Daniel Bünzli (daniel.buen...@erratique.ch) > wrot

Re: UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
On 4 March 2020 at 18:48:09, Daniel Bünzli (daniel.buen...@erratique.ch) wrote: > On 4 March 2020 at 18:01:25, Daniel Bünzli (daniel.buen...@erratique.ch) > wrote: > > > Re-reading the text I suspect I should not restart the rules from the first > > one when a > WB4 > > rewrite occurs but

Re: UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
On 4 March 2020 at 18:01:25, Daniel Bünzli (daniel.buen...@erratique.ch) wrote: > Re-reading the text I suspect I should not restart the rules from the first > one when a WB4 > rewrite occurs but only apply the subsequent rules. Is that correct ? However even if that's correct I don't

UAX #29 and WB4

2020-03-04 Thread Daniel Bünzli via Unicode
Hello,  My implementation of word break chokes only on the following test case from the file [1]:  ÷ 0020 × 0308 ÷ 0020 ÷ #  ÷ [0.2] SPACE (WSegSpace) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] SPACE (WSegSpace) ÷ [0.3]  I find:  ÷ 0020 × 0308 × 0020 ÷ Basically my implementation

Re: UAX #14 for 13.0.0: LB27 first's line is obsolete

2020-03-03 Thread Andy Heninger via Unicode
I agree. The LB27 first part rule (JL | JV | JT | H2 | H3) × IN appears to be redundant. Good catch. -- Andy On Tue, Mar 3, 2020 at 1:53 PM Daniel Bünzli wrote: > Hello, > > I think (more precisely my compiler thinks [1]) the first line of LB27 is > already handled by the new LB22 rule

UAX #14 for 13.0.0: LB27 first's line is obsolete

2020-03-03 Thread Daniel Bünzli via Unicode
Hello,  I think (more precisely my compiler thinks [1]) the first line of LB27 is already handled by the new LB22 rule and can be removed.  Best,  Daniel [1] File "uuseg_line_break.ml", line 206, characters 38-40: 206 |   | (* LB27 *)  _, (JL|JV|JT|H2|H3), (IN|PO) -> no_boundary s            

Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Hans Åberg via Unicode
> On 21 Feb 2020, at 13:21, Costello, Roger L. via Unicode > wrote: > > There are binary files and there are text files. In C, when opening a file as binary with the function fopen, the newlines are untranslated [1]. If not using this option, the file is informally text,

RE: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Doug Ewell via Unicode
Costello, Roger L. wrote: > Text files may indeed contain binary (i.e., bytes that are not> interpretable as characters). Namely, text files may contain newlines,> tabs, and some other invisible things.>> Question: "characters" are defined as only the visible things, right? In addition to this

Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Ken Whistler via Unicode
On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote: Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible things. Question: "characters" are defined as only t

Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Richard Wordingham via Unicode
On Fri, 21 Feb 2020 15:53:52 + "Costello, Roger L. via Unicode" wrote: > Based on a private correspondence, I now realize that this statement: > > > > > Text files do not contain binary > > > > is not correct. > > > &g

RE: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Costello, Roger L. via Unicode
Based on a private correspondence, I now realize that this statement: > Text files do not contain binary is not correct. Text files may indeed contain binary (i.e., bytes that are not interpretable as characters). Namely, text files may contain newlines, tabs, and some other invisible

Re: Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread via Unicode
Dear Roger, because in when unicode is used in real life, utf8 etc then text ⊂ binary John Knightley On 2020-02-21 20:21, Costello, Roger L. via Unicode wrote: Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example

Why do binary files contain text but text files don't contain binary?

2020-02-21 Thread Costello, Roger L. via Unicode
Hi Folks, There are binary files and there are text files. Binary files often contain portions that are text. For example, the start of Windows executable files is the text MZ. To the best of my knowledge, text files never contain binary, i.e., bytes that cannot be interpreted as characters.

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread wjgo_10...@btinternet.com via Unicode
Message -- From: "via Unicode" To: wjgo_10...@btinternet.com Cc: unicode@unicode.org Sent: Saturday, 2020 Feb 15 At 10:11 Subject: Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop) Hi William, I don't fully understand your propose

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread via Unicode
? Couldn't you simply capitalize on the rules that already exist for entities? Best wishes, jk -- Joel Kalvesmaki Director, Text Alignment Network http://textalign.net On 2020-02-14 15:52, wjgo_10...@btinternet.com via Unicode wrote: The solution is to invent my own encoding space. This sits on top

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread wjgo_10...@btinternet.com via Unicode
The solution is to invent my own encoding space. This sits on top of Unicode, could be (perhaps?) called markup, but it works! It may be perilous, because some software may enforce the strict official code point limits. I have now realized that what I wrote before is ambiguous. When I

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread Hans Åberg via Unicode
> On 13 Feb 2020, at 16:41, wjgo_10...@btinternet.com via Unicode > wrote: > > Yet a Private Use Area encoding at a particular code point is not unique. > Thus, except with care amongst people who are aware of the particular > encoding, there is no interoperability, su

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-14 Thread Adam Borowski via Unicode
On Thu, Feb 13, 2020 at 09:15:18PM +, Richard Wordingham via Unicode wrote: > On Thu, 13 Feb 2020 20:15:07 + > Shawn Steele via Unicode wrote: > > > I confess that even though I know nothing about Hieroglyphs, that I > > find it fascinating that such a thoroughly de

Aw: RE: Egyptian Hieroglyph Man with a Laptop

2020-02-14 Thread Marius Spix via Unicode
That glyph is coded on position U+1F5B3 OLD PERSONAL COMPUTER, see http://users.teilar.gr/~g1951d/Aegyptus.pdf     Gesendet: Donnerstag, 13. Februar 2020 um 07:58 Uhr Von: "うみほたる via Unicode" An: unicode@unicode.org Betreff: RE: Egyptian Hieroglyph Man with a Laptop The earl

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread via Unicode
is a living script so new characters come and go, not all become widespread in there use. "Egyptologist" is certainly an outlier, an certainly strange to me. One question is what do "Egyptologist" think of it. John On 2020-02-14 08:13, Ken Whistler via Unicode wrote: Well, no, i

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
Well, no, in this case "strange" means strange, as Ken Lunde notes. I'm just pointing to his list, because it pulls together quite a few Han characters that *also* have dubious cases for encoding. Or you could turn the argument around, I suppose, and note that just because the hieroglyph for

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread via Unicode
Dear Ken An interesting comparison, if strange means dubious, then the name kstrange should be changed or some of the content removed because many of the characters in the set are not dubious in the least. Regards John On 2020-02-14 04:08, Ken Whistler via Unicode wrote: You want "du

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Richard Wordingham via Unicode
On Thu, 13 Feb 2020 20:15:07 + Shawn Steele via Unicode wrote: > I confess that even though I know nothing about Hieroglyphs, that I > find it fascinating that such a thoroughly dead script might still be > living in some way, even if it's only a little bit. Plenty of people have l

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Asmus Freytag via Unicode
On 2/12/2020 3:26 PM, Shawn Steele via Unicode wrote: From the point of view of Unicode, it is simpler: If the character is in use or have had use, it should be included somehow. That bar, to me, seems too low. Many things are only used

RE: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Shawn Steele via Unicode
- From: Unicode On Behalf Of Ken Whistler via Unicode Sent: Thursday, February 13, 2020 12:08 PM To: Phake Nick Cc: unicode@unicode.org Subject: Re: Egyptian Hieroglyph Man with a Laptop You want "dubious"?! You should see the hundreds of strange characters already encoded in the CJK

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Ken Whistler via Unicode
hieroglyph of a man (or woman) holding a laptop is positively orthodox! --Ken On 2/13/2020 11:47 AM, Phake Nick via Unicode wrote: Those characters could also be put into another block for the same script similar to how dubious characters in CJK are included by placing them into "CJK Compatibi

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Phake Nick via Unicode
Those characters could also be put into another block for the same script similar to how dubious characters in CJK are included by placing them into "CJK Compatibility Ideographs" for round trip compatibility with source encoding. 在 2020年2月14日週五 03:35,Richard Wordingham via Unicode 寫道:

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Richard Wordingham via Unicode
On Thu, 13 Feb 2020 10:18:40 +0100 Hans Åberg via Unicode wrote: > > On 13 Feb 2020, at 00:26, Shawn Steele > > wrote: > >> From the point of view of Unicode, it is simpler: If the character > >> is in use or have had use, it should be included somehow. >

What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-13 Thread wjgo_10...@btinternet.com via Unicode
Hans Åberg >>> From the point of view of Unicode, it is simpler: If the character is in use or have had use, it should be included somehow. Shawn Steele >> That bar, to me, seems too low. Many things are only used briefly or in a private context that doesn;t really require encoding. Hans

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Frédéric Grosshans via Unicode
Le 12/02/2020 à 23:30, Michel Suignard a écrit : Interesting that a single character is creating so much feedback, but it is not the first time. Extrapolating from my own case, I guess it’s because hieroglyphs have a strong cultural significance — especially to people following unicode

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Hans Åberg via Unicode
> On 13 Feb 2020, at 00:26, Shawn Steele wrote: > >> From the point of view of Unicode, it is simpler: If the character is in use >> or have had use, it should be included somehow. > > That bar, to me, seems too low. Many things are only used briefly or in a > private context that doesn't

RE: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Shawn Steele via Unicode
> From the point of view of Unicode, it is simpler: If the character is in use > or have had use, it should be included somehow. That bar, to me, seems too low. Many things are only used briefly or in a private context that doesn't really require encoding. The hieroglyphs discussion is

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Hans Åberg via Unicode
> On 12 Feb 2020, at 23:30, Michel Suignard via Unicode > wrote: > > These abstract collections have started to appear in the first part of the > nineteen century (Champollion starting in 1822). Interestingly these > collections have started to be useful on their own ev

RE: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Michel Suignard via Unicode
urse if we accept some hieroglyphs for compatibility purpose, but this is not mentioned as a valid reason in any propoal yet. > In my opinion, this is an invalid character, which should not be > included in Unicode. I agree. Frédéric > > On Thu, 12 Feb 2020 19:12:14 +0100 > Frédé

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Markus Scherer via Unicode
On Wed, Feb 12, 2020 at 11:37 AM Marius Spix via Unicode < unicode@unicode.org> wrote: > In my opinion, this is an invalid character, which should not be > included in Unicode. > Please remember that feedback that you want the committee to look at needs to go through http://

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Joe Becker via Unicode
I assume this glyph was created to honor Cleo Huggins, the designer of Sonata at Adobe, who decades ago created a similar hieroglyph of a *woman* in front of her computer. Joe

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Frédéric Grosshans via Unicode
  Frédéric On Thu, 12 Feb 2020 19:12:14 +0100 Frédéric Grosshans via Unicode wrote: Dear Unicode list members (CC Michel Suignard),   the Unicode proposal L2/20-068 <https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf>, “Revised draft for the encoding of an extended Egyp

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Marius Spix via Unicode
it is also possible that the someone wanted to smuggle an easter-egg into Unicode or just test if the quality assurance works. In my opinion, this is an invalid character, which should not be included in Unicode. On Thu, 12 Feb 2020 19:12:14 +0100 Frédéric Grosshans via Unicode wrote: > Dear

Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Frédéric Grosshans via Unicode
Dear Unicode list members (CC Michel Suignard),   the Unicode proposal L2/20-068 , “Revised draft for the encoding of an extended Egyptian Hieroglyphs repertoire, Groups A to N” (

RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-12 Thread Sławomir Osipiuk via Unicode
On Wed, Feb 12, 2020 at 11:28 AM wjgo_10...@btinternet.com via Unicode wrote: > > I am reminded of the teletext system (with brand names such as Ceefax and > Oracle) in the United KIngdom, which was a broadcasting technology introduced > in the 1970s and which became very

RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-12 Thread wjgo_10...@btinternet.com via Unicode
Hi At the time, I thought that my post yesterday concluded the thread. However, later something occurred to me as a result of something in the post by Sławomir Osipiuk. The gentleman wrote as follows: Sending multiples of the same message in different languages is really only applicable

Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-11 Thread wjgo_10...@btinternet.com via Unicode
Hi Thank you to everybody who replied to this thread, both online and offline. Sławomir Osipiuk wrote: As for "concatenation of such plain text sequences" where each sequence is in a different language, ... Actually I was meaning the concatenation of a number of messages, one from each

Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Mark E. Shoulson via Unicode
On 2/10/20 6:14 PM, Sławomir Osipiuk via Unicode wrote: As for "concatenation of such plain text sequences" where each sequence is in a different language, I must again ask: Is there a system that actually does this, that does not have a higher-level protocol that can carry meta

RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Sławomir Osipiuk via Unicode
The examples given don't convince me that "higher-level protocols" would not be sufficient. There are very few messages being sent in the "Internet of Things" that are truly plain-text. Even those that use a text base (as opposed to binary data) are still in some kind of structured computer

Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Steffen Nurpmeso via Unicode
wjgo_10...@btinternet.com via Unicode wrote in <141cecf1.23e.1702ea529c1.webtop@btinternet.com>: |Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good |reason why I ask | |There is a German song, Lorelei, and I searched to find an English |translation. Regarding

Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread wjgo_10...@btinternet.com via Unicode
Hi Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask There is a German song, Lorelei, and I searched to find an English translation. I found the following video. https://www.youtube.com/watch?v=lJ3JhxOUbw0 The video is an instrumental version and is

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Asmus Freytag via Unicode
On 2/2/2020 5:22 PM, Richard Wordingham via Unicode wrote: On Sun, 2 Feb 2020 16:20:07 -0800 Eric Muller via Unicode wrote: That would imply some coordination among variations sequences on different code points, right? E.g. <0B48> ≡ <0

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Richard Wordingham via Unicode
On Sun, 2 Feb 2020 16:20:07 -0800 Eric Muller via Unicode wrote: > That would imply some coordination among variations sequences on > different code points, right? > > E.g. <0B48> ≡ <0B47, 0B56>, so a variation sequence on 0B56 (Mn, > ccc=0) would imply the existe

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Eric Muller via Unicode
me variation selector, and the same effect. Eric. On 2/2/2020 11:43 AM, Mark Davis ☕️ via Unicode wrote: I don't think there is a technical reason for disallowing variation selectors after any starters (ccc=000); the normalizati

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Mark Davis ☕️ via Unicode
I don't think there is a technical reason for disallowing variation selectors after any starters (ccc=000); the normalization algorithm doesn't care about the general category of characters. Mark On Sun, Feb 2, 2020 at 10:09 AM Richard Wordingham via Unicode < unicode@unicode.org>

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Richard Wordingham via Unicode
On Sun, 2 Feb 2020 07:51:56 -0800 Ken Whistler via Unicode wrote: > What it comes down to is avoidance of conundrums involving canonical > reordering for normalization. The effect of variation selectors is > defined in terms of an immediate adjacency. If you allowed variation &g

Re: Combining Marks and Variation Selectors

2020-02-02 Thread Ken Whistler via Unicode
explicit, for example. --Ken On 2/1/2020 7:30 PM, Richard Wordingham via Unicode wrote: Ah, I missed that change from Version 5.0, where the restriction was, 'The base character in a variation sequence is never a combining character or a decomposable character'. I now need to rephrase

Re: Combining Marks and Variation Selectors

2020-02-01 Thread Richard Wordingham via Unicode
On Sat, 1 Feb 2020 17:59:57 -0800 Roozbeh Pournader via Unicode wrote: > They are actually allowed on combining marks of ccc=0. We even define > one such variation sequence for Myanmar, IIRC. > > On Sat, Feb 1, 2020, 2:12 PM Richard Wordingham via Unicode < > unicode@

Re: Combining Marks and Variation Selectors

2020-02-01 Thread Roozbeh Pournader via Unicode
They are actually allowed on combining marks of ccc=0. We even define one such variation sequence for Myanmar, IIRC. On Sat, Feb 1, 2020, 2:12 PM Richard Wordingham via Unicode < unicode@unicode.org> wrote: > Why are variation selectors not allowed for combining marks? I can see &

Combining Marks and Variation Selectors

2020-02-01 Thread Richard Wordingham via Unicode
Why are variation selectors not allowed for combining marks? I can see a reason for them not being allowed on characters with non-zero canonical combining classes, but not for them being prohibited for combining marks that are starters, i.e. have ccc=0. Richard.

Re: Adding Experimental Control Characters for Tai Tham

2020-01-29 Thread Ken Whistler via Unicode
unrelated would probably not be a good idea. For experimentation purposes, VS13 and VS14 would be safer. --Ken On 1/25/2020 10:41 AM, Richard Wordingham via Unicode wrote: How inappropriate would it be to usurp a pair of variation selectors for this purpose? For mnemonic purposes, I would suggest

Adding Experimental Control Characters for Tai Tham

2020-01-25 Thread Richard Wordingham via Unicode
This topic is very similar to the recent topic "How to make custom combining diacritical marks for arabic letters?". There is a suggestion that the encoding of Tai Tham syllables be changed (https://www.unicode.org/L2/L2019/19365-tai-tham-structure.pdf, by Martin Hosken), and there is a strong

Stop words for CLDR

2020-01-23 Thread Marius Spix via Unicode
I wonder if there is any interest in adding stop words to CLDR? Stop words are ignored by natural language processing algorithms, with use cases like search engines, word clouds and text classification. There are already existing collections with stop words like [1] or [2] which could be used,

Re: [unihan] Unihan variants information

2020-01-17 Thread jenkins via Unicode
Very impressive! Thank you for this. > On Jan 17, 2020, at 6:03 AM, Michel Mariani via Unihan > wrote: > > FYI, the "Unihan Variants" utility has been recently added to the open-source > application Unicopedia Plus . > It provides both the

Re: how to make custom combining diacritical marks for arabic letters?

2020-01-15 Thread dinar qurbanov via Unicode
rea for that. 2020-01-14 20:02 GMT+03:00, Lorna Evans : > What are the combining marks supposed to look like? Are they your > creation or do you have samples of usage? It is true that you will not > likely get combining marks to work if either they or the base character > are PUA.

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Philippe Verdy via Unicode
, Transifex, Google Translator, RessourceBundle and formatting API in Java, .po/.pot for Gettext in many opensource projects, Facebook translation tool, internationalization APIs in Windows, iOS, MacOS, and the ICU library which is the de facto base for CLDR... Le mar. 14 janv. 2020 à 16:11, wjgo_10

Re: how to make custom combining diacritical marks for arabic letters?

2020-01-14 Thread Lorna Evans via Unicode
:30 PM, dinar qurbanov via Unicode wrote: hello. you can browse to replies that are not quoted below from https://unicode.org/mail-arch/unicode-ml/y2018-m05/0039.html . where can i write some bug reports or feature requests in order to get custom diacritic marks automatically positioned at right

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Nelson H. F. Beebe via Unicode
William, this is off the Unicode list. See http://mathreader.livejournal.com/9239.html for a list of 207 variants of Chebyshev's name. --- - Nelson H. F. BeebeTel: +1 801 581 5254

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread wjgo_10...@btinternet.com via Unicode
The reply from Mr Verdy has indeed been helpful, as indeed has also been an offlist private reply from someone who has, thus far, not been a participant in this thread. Mr Verdy wrote: You seem to have never seen how translation packages work and are used in common projects (not just CLDR,

Re: Geological symbols

2020-01-14 Thread Hans Åberg via Unicode
For rendering, you might have a look at ConTeXt, because I recall it has an option whereby Unicode super- and sub-scripts can be displayed over each other without extra processing. > On 14 Jan 2020, at 06:44, via Unicode wrote: > > Thanks for your reply. I think actually LaTeX is n

AW: Geological symbols

2020-01-13 Thread via Unicode
t know how widespread the use of LateX is among geologists, but notation like this is a perfect use case for LaTeX. --Jörg Knappen Gesendet: Montag, 13. Januar 2020 um 12:20 Uhr Von: "Thomas Spehs (MonMap) via Unicode" mailto:unicode@unicode.org> > An: unicode@unicode.org &

Re: New Unicode Working Group: Message Formatting

2020-01-13 Thread Steven R. Loomis via Unicode
> El ene. 11, 2020, a las 11:37 a. m., wjgo_10...@btinternet.com via Unicode > escribió: > > A person in England, … As noted in the blog, the scope of this working group is a syntax for "adapting programs”. It is not intended for individual communication between two perso

Re: Geological symbols

2020-01-13 Thread Philippe Verdy via Unicode
, and even if these conventions are used, assumptions made could infer sometimes the incorrect layout. Le lun. 13 janv. 2020 à 17:16, Oren Watson via Unicode a écrit : > This is not possible in unicode plaintext as far as I can tell, since > Unicode doesn't allow overstriking arbitrary character

Re: New Unicode Working Group: Message Formatting

2020-01-13 Thread wjgo_10...@btinternet.com via Unicode
I notice that in the web page https://github.com/unicode-org/message-format-wg/issues/3 there is a request to add more features. One of those requested features is as follows Inflections (genders, articles, delensions, etc.) So I am wondering quite what formats will be covered by the

Re: Geological symbols

2020-01-13 Thread Oren Watson via Unicode
kerning. On Mon, Jan 13, 2020 at 10:14 AM Thomas Spehs (MonMap) via Unicode < unicode@unicode.org> wrote: > Hi, I would like to ask if there is any way to create geological “symbols” > with Unicode such as: Q₁¹ˉ², but with the two “1”s over each other, > without a space. Thanks! >

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread Philippe Verdy via Unicode
in multiple languages or the language of user's choice. So your question is non-sense with the example you give. Le sam. 11 janv. 2020 à 21:21, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > A person in England, who knows no German, wants to send the parcel to a

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread wjgo_10...@btinternet.com via Unicode
A person in England, who knows no German, wants to send the parcel to a person in Germany, who knows no English. The person in England wants to send a message about the delivery to the person in Germany.. English: “The package will arrive at {time} on {date}.” The person want to send the

Re: New Unicode Working Group: Message Formatting

2020-01-10 Thread James Kass via Unicode
the *format* of the strings, not their *repertoire*. That is, should the string be “Arrival: %s” or “Arrival: ${date}” or “Arrival: {0}”? Does that answer your question? -- Steven R. Loomis | @srl295 | git.io/srl295 El ene. 10, 2020, a las 2:48 p. m., James Kass via Unicode escribió

  1   2   3   4   5   6   7   8   9   10   >