Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread wjgo_10...@btinternet.com via Unicode
Message -- From: "via Unicode" To: wjgo_10...@btinternet.com Cc: unicode@unicode.org Sent: Saturday, 2020 Feb 15 At 10:11 Subject: Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop) Hi William, I don't fully understand your propose

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-15 Thread via Unicode
? Couldn't you simply capitalize on the rules that already exist for entities? Best wishes, jk -- Joel Kalvesmaki Director, Text Alignment Network http://textalign.net On 2020-02-14 15:52, wjgo_10...@btinternet.com via Unicode wrote: The solution is to invent my own encoding space. This sits on top

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread wjgo_10...@btinternet.com via Unicode
The solution is to invent my own encoding space. This sits on top of Unicode, could be (perhaps?) called markup, but it works! It may be perilous, because some software may enforce the strict official code point limits. I have now realized that what I wrote before is ambiguous. When I

Re: What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-14 Thread Hans Åberg via Unicode
> On 13 Feb 2020, at 16:41, wjgo_10...@btinternet.com via Unicode > wrote: > > Yet a Private Use Area encoding at a particular code point is not unique. > Thus, except with care amongst people who are aware of the particular > encoding, there is no interoperability, su

What should or should not be encoded in Unicode? (from Re: Egyptian Hieroglyph Man with a Laptop)

2020-02-13 Thread wjgo_10...@btinternet.com via Unicode
Hans Åberg >>> From the point of view of Unicode, it is simpler: If the character is in use or have had use, it should be included somehow. Shawn Steele >> That bar, to me, seems too low. Many things are only used briefly or in a private context that doesn;t really require e

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Philippe Verdy via Unicode
, Transifex, Google Translator, RessourceBundle and formatting API in Java, .po/.pot for Gettext in many opensource projects, Facebook translation tool, internationalization APIs in Windows, iOS, MacOS, and the ICU library which is the de facto base for CLDR... Le mar. 14 janv. 2020 à 16:11, wjgo_10

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread Nelson H. F. Beebe via Unicode
William, this is off the Unicode list. See http://mathreader.livejournal.com/9239.html for a list of 207 variants of Chebyshev's name. --- - Nelson H. F. BeebeTel: +1 801 581 5254

Re: New Unicode Working Group: Message Formatting

2020-01-14 Thread wjgo_10...@btinternet.com via Unicode
The reply from Mr Verdy has indeed been helpful, as indeed has also been an offlist private reply from someone who has, thus far, not been a participant in this thread. Mr Verdy wrote: You seem to have never seen how translation packages work and are used in common projects (not just CLDR,

Re: New Unicode Working Group: Message Formatting

2020-01-13 Thread Steven R. Loomis via Unicode
> El ene. 11, 2020, a las 11:37 a. m., wjgo_10...@btinternet.com via Unicode > escribió: > > A person in England, … As noted in the blog, the scope of this working group is a syntax for "adapting programs”. It is not intended for individual communication between two perso

Re: New Unicode Working Group: Message Formatting

2020-01-13 Thread wjgo_10...@btinternet.com via Unicode
I notice that in the web page https://github.com/unicode-org/message-format-wg/issues/3 there is a request to add more features. One of those requested features is as follows Inflections (genders, articles, delensions, etc.) So I am wondering quite what formats will be covered

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread Philippe Verdy via Unicode
in multiple languages or the language of user's choice. So your question is non-sense with the example you give. Le sam. 11 janv. 2020 à 21:21, wjgo_10...@btinternet.com via Unicode < unicode@unicode.org> a écrit : > A person in England, who knows no German, wants to send the parcel to a

Re: New Unicode Working Group: Message Formatting

2020-01-11 Thread wjgo_10...@btinternet.com via Unicode
A person in England, who knows no German, wants to send the parcel to a person in Germany, who knows no English. The person in England wants to send a message about the delivery to the person in Germany.. English: “The package will arrive at {time} on {date}.” The person want to send the

Re: New Unicode Working Group: Message Formatting

2020-01-10 Thread James Kass via Unicode
the *format* of the strings, not their *repertoire*. That is, should the string be “Arrival: %s” or “Arrival: ${date}” or “Arrival: {0}”? Does that answer your question? -- Steven R. Loomis | @srl295 | git.io/srl295 El ene. 10, 2020, a las 2:48 p. m., James Kass via Unicode escribió

Re: New Unicode Working Group: Message Formatting

2020-01-10 Thread Steven R. Loomis via Unicode
las 2:48 p. m., James Kass via Unicode > escribió: > > > On 2020-01-10 9:55 PM, announceme...@unicode.org wrote: >> But until now we have not had a syntax for localizable message strings >> standardized by Unicode. > > What is the difference between “localizabl

Re: New Unicode Working Group: Message Formatting

2020-01-10 Thread James Kass via Unicode
* sentences On 2020-01-10 10:48 PM, James Kass wrote: On 2020-01-10 9:55 PM, announceme...@unicode.org wrote: But until now we have not had a syntax for localizable message strings standardized by Unicode. What is the difference between “localizable message strings” and “localized

Re: New Unicode Working Group: Message Formatting

2020-01-10 Thread James Kass via Unicode
On 2020-01-10 9:55 PM, announceme...@unicode.org wrote: But until now we have not had a syntax for localizable message strings standardized by Unicode. What is the difference between “localizable message strings” and “localized sentances”?  Asking for a friend.

Re: Call for feedback on UTS #18: Unicode Regular Expressions

2020-01-02 Thread Mark Davis ☕️ via Unicode
e above line, but a recap that didn't have precisely the same description. It's best to point to the exact description, and have that be in one place. Mark On Thu, Jan 2, 2020 at 6:40 PM Karl Williamson via Unicode < unicode@unicode.org> wrote: > One thing I noticed in reviewing this is the

Re: Call for feedback on UTS #18: Unicode Regular Expressions

2020-01-02 Thread Karl Williamson via Unicode
should use a loose match, disregarding case, spaces and hyphen (the underbar character "_" cannot occur in Unicode character names). An implementation may also choose to allow namespaces, where some prefix like "LATIN LETTER" is set globally and used if there i

RE: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-21 Thread Peter Constable via Unicode
I suspect if you look at the JPEG and MPEG standards you'll find there is a normative reference to Unicode or ISO/IEC 10646. Same for standards specifying C, ECMAScript and other languages in which modern software is written. So, arguably the statement isn't much of a stretch at all. Peter

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-20 Thread Richard Wordingham via Unicode
On Tue, 19 Nov 2019 20:02:55 + James Kass via Unicode wrote: > On 2019-11-19 6:59 PM, Costello, Roger L. via Unicode wrote: > > Today I received an email from the Unicode organization. The email > > said this: (italics and yellow highlighting are mine) > > >

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread James Kass via Unicode
On 2019-11-19 11:00 PM, Mark E. Shoulson via Unicode wrote: Why so concerned with these minutiæ? Were you in fact misled?  (Doesn't sound like it.)  Do you know someone who was, or whom you fear would be?  What incorrect conclusions might they draw from that misunderstanding, and how serious

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Asmus Freytag via Unicode
On 11/19/2019 3:00 PM, Mark E. Shoulson via Unicode wrote: It says "foundation", not "sum total, all there is."  I don't think this is much overreach.  MAYBE it counts as "enthusiastic", but not misleadi

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Mark E. Shoulson via Unicode
ho was, or whom you fear would be?  What incorrect conclusions might they draw from that misunderstanding, and how serious would they be?  Doesn't sound like this is really anything serious even if you were right. ~mark On 11/19/19 1:59 PM, Costello, Roger L. via Unicode wrote: Hi Folks,

RE: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Jonathan Rosenne via Unicode
As a user of bidirectional text when I think of our world before Unicode and the situation today I cannot but wholeheartedly agree. Without Unicode, few international vendors, major and in particular minor ones, would have considered implementing Hebrew in their products. Now we have

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Asmus Freytag via Unicode
On 11/19/2019 12:04 PM, Michael Everson via Unicode wrote: Of course it’s not “misleading”. Human language is best conveyed by text. One could insert the language in [ ] to make the claim sound less like an overreach. It doesn't even impede

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread James Kass via Unicode
On 2019-11-19 6:59 PM, Costello, Roger L. via Unicode wrote: Today I received an email from the Unicode organization. The email said this: (italics and yellow highlighting are mine) The Unicode Standard is the foundation for all modern software and communications around the world, including

Re: Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Michael Everson via Unicode
Of course it’s not “misleading”. Human language is best conveyed by text. Michael Everson > On 19 Nov 2019, at 18:59, Costello, Roger L. via Unicode > wrote: > > Hi Folks, > > Today I received an email from the Unicode organization. The email said this: > (italics an

Is the Unicode Standard "The foundation for all modern software and communications around the world"?

2019-11-19 Thread Costello, Roger L. via Unicode
Hi Folks, Today I received an email from the Unicode organization. The email said this: (italics and yellow highlighting are mine) The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops

Re: Grapheme clusters & backspace (was: Unicode Digest, Vol 70, Issue 17)

2019-10-23 Thread Richard Wordingham via Unicode
On Wed, 23 Oct 2019 02:31:09 + Ben Morphett via Unicode wrote: > It totally depends on the editor. In Notepad++, when I backspace > over "Man Teacher: Dark Skin Tone", I get "Man Teacher: Dark Skin > Tone" => ""Man: Dark Skin Tone"

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Richard Wordingham via Unicode
On Sat, 12 Oct 2019 18:15:38 +0800 Fred Brennan via Unicode wrote: > Indeed - it is extremely unfortunate that users will need to wait > until 2021(!) to get it into Unicode so Google will finally add it to > the Noto fonts. > If that's just how things are done, fine, I certainly

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Ken Whistler via Unicode
On 10/12/2019 3:15 AM, Fred Brennan via Unicode wrote: There seems to be no conscionable reason for such a long delay after the approval. If that's just how things are done, fine, I certainly can't change the whole system. But imagine if you had to wait two years to even have a chance

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Richard Wordingham via Unicode
On Sat, 12 Oct 2019 18:15:38 +0800 Fred Brennan via Unicode wrote: > Indeed - it is extremely unfortunate that users will need to wait > until 2021(!) to get it into Unicode so Google will finally add it to > the Noto fonts. > There seems to be no conscionable reason for such a long

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-12 Thread Fred Brennan via Unicode
On Saturday, October 12, 2019 6:28:01 AM PST Rebecca Bettencourt via Unicode wrote: > This proposal was special in that it was asking the Unicode Consortium to > recognize a character that was already being used unofficially, so that > organizations like the Google Noto team who are

Re: Website format (was Re: Unicode website glitches. (was The Most Frequent Emoji))

2019-10-12 Thread Asmus Freytag via Unicode
On 10/12/2019 1:16 AM, Daniel Bünzli via Unicode wrote: With all due respect for the work that has been done on the new website I think that the new structure significantly decreased the usability of the website for technical users

Website format (was Re: Unicode website glitches. (was The Most Frequent Emoji))

2019-10-12 Thread Daniel Bünzli via Unicode
On 12 October 2019 at 02:05:23, Martin J. Dürst via Unicode (unicode@unicode.org) wrote: > I think it's less the format and much more the split personality of the > Unicode Web site(s?) that I have problems with. I also do.  One thing that is particulary annoying is the fact that the

Re: Unicode website glitches. (was The Most Frequent Emoji)

2019-10-11 Thread Martin J . Dürst via Unicode
n confirm that a hard reload fixed the problem. > BTW, if you want to comment on the format as opposed to glitches, please > change the subject line. I think it's less the format and much more the split personality of the Unicode Web site(s?) that I have problems with. Regards,

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Rebecca Bettencourt via Unicode
0D." (L2/19-258R, page 6) This proposal was special in that it was asking the Unicode Consortium to recognize a character that was already being used unofficially, so that organizations like the Google Noto team who are sticklers for Unicode compliance would include it. :)

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Doug Ewell via Unicode
Ken Whistler wrote: > So, in general, no, you can *never* assume that once the UTC has just > approved a new character that it will be in the next version of > Unicode. I got quite a few messages like this when UTC approved the legacy computing characters in L2/19-025 last Janua

Unicode website glitches. (was The Most Frequent Emoji)

2019-10-11 Thread Mark Davis ☕️ via Unicode
change the subject line. Mark On Thu, Oct 10, 2019 at 11:50 PM Martin J. Dürst via Unicode < unicode@unicode.org> wrote: > I had a look at the page with the frequencies. Many emoji didn't > display, but that's my browser's problem. What was worse was that the > sidebar and the stuf

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
Sorry about the typo there. I meant "the published Version 13.0 next March" --Ken On 10/11/2019 10:17 AM, Ken Whistler wrote: then eventually in the published Version 13.0 next month:

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Ken Whistler via Unicode
.org/alloc/Pipeline.html#planned_next_version Characters listed in the "Characters for Future Versions" table: https://www.unicode.org/alloc/Pipeline.html#future are not yet targeted for any particular version. Many of them, including the Tagalog letter RA, will end up published in Unicode 14.

Re: Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Markus Scherer via Unicode
On Fri, Oct 11, 2019 at 4:37 AM Fred Brennan via Unicode < unicode@unicode.org> wrote: > Many users are asking me and I'm not sure of the answer (nor how to find > it > out). > You can find out by looking at the data files that are being developed for Unicode 13.

Will TAGALOG LETTER RA, currently in the pipeline, be in the next version of Unicode?

2019-10-11 Thread Fred Brennan via Unicode
Many users are asking me and I'm not sure of the answer (nor how to find it out). The UTC approved it, so it will be in the next version of Unicode, right? We sure hope so...it is a character needed to write a script in current use. Although only a minority of people care about

Re: Access to the Unicode technical site (was: Re: Unicode's got a new logo?)

2019-07-19 Thread Steffen Nurpmeso via Unicode
Hello Mr. Ken Whistler. Ken Whistler wrote in <3d1676bb-f3c1-8a3e-fdc5-1c0bdd74a...@sonic.net>: |On 7/18/2019 11:50 AM, Steffen Nurpmeso via Unicode wrote: |> I also decided to enter /L2 directly from now on. | |For folks wishing to access the UTC document register, Unicode |C

Access to the Unicode technical site (was: Re: Unicode's got a new logo?)

2019-07-18 Thread Ken Whistler via Unicode
On 7/18/2019 11:50 AM, Steffen Nurpmeso via Unicode wrote: I also decided to enter /L2 directly from now on. For folks wishing to access the UTC document register, Unicode Consortium standards, and so forth, all of those links will be permanently stable. They are not impacted

Re: Unicode "no-op" Character?

2019-07-12 Thread Sławomir Osipiuk via Unicode
whose canonical equivalent is the absence of a character. The logical consequences of that statement apply fully. On Wed, Jul 3, 2019 at 8:00 PM Shawn Steele via Unicode wrote: > > Even more complicated is that, as pointed out by others, it's pretty much > impossible to say "thes

RE: Unicode "no-op" Character?

2019-07-04 Thread Doug Ewell via Unicode
would use codepoint 1 for their own > thing, and there'd be a conflict. That's pretty much what happened with NUL. It was originally intended (long, long before Unicode) to be ignorable and have no meaning, but then other processes were designed that gave it specific meaning, and that was pretty much

RE: Unicode "no-op" Character?

2019-07-03 Thread Shawn Steele via Unicode
for the dude that didn't get the memo and made their own scheme.) Unicode was explicitly intended *not* to encode any of that kind of markup, and, instead, be "plain text," leaving other interesting metadata to other higher level protocols. Whether those be word breaking, sentence parsi

Re: Unicode "no-op" Character?

2019-07-03 Thread Richard Wordingham via Unicode
On Wed, 3 Jul 2019 17:51:29 -0400 "Mark E. Shoulson via Unicode" wrote: > I think the idea being considered at the outset was not so complex as > these (and indeed, the point of the character was to avoid making > these kinds of decisions). Shawn Steele appe

Re: Unicode "no-op" Character?

2019-07-03 Thread Mark E. Shoulson via Unicode
What you're asking for, then, is completely possible and achievable—but not in the Unicode Standard.  It's out of scope for Unicode, it sounds like.  You've said you realize it won't happen in Unicode, but it still can happen.  Go forth and implement it, then: make your higher-level protocol

Re: Unicode "no-op" Character?

2019-07-03 Thread Mark E. Shoulson via Unicode
, Richard Wordingham via Unicode wrote: On Sat, 22 Jun 2019 23:56:50 + Shawn Steele via Unicode wrote: + the list. For some reason the list's reply header is confusing. From: Shawn Steele Sent: Saturday, June 22, 2019 4:55 PM To: Sławomir Osipiuk Subject: RE: Unicode "no-op"

Re: Unicode "no-op" Character?

2019-07-03 Thread Mark E. Shoulson via Unicode
rk On 7/3/19 11:44 AM, Sławomir Osipiuk via Unicode wrote: A process, let’s call it Process W, adds a bunch of U+000F to a string it received, or built, or a user entered via keyboard. Maybe it’s to packetize. Maybe to mark every word that is an anagram of the name of a famous 19^th -centu

Re: Unicode "no-op" Character?

2019-07-03 Thread Ken Whistler via Unicode
On 7/3/2019 10:47 AM, Sławomir Osipiuk via Unicode wrote: Is my idea impossible, useless, or contradictory? Not at all. What you are proposing is in the realm of higher-level protocols. You could develop such a protocol, and then write processes that honored it, or try to convince others

Re: Unicode "no-op" Character?

2019-07-03 Thread Rebecca Bettencourt via Unicode
On Wed, Jul 3, 2019 at 8:47 AM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > Security gateways filter it out completely, as a matter of best practice > and security-in-depth. > > > > A process, let’s call it Process W, adds a bunch of U+000F to a string

RE: Unicode "no-op" Character?

2019-07-03 Thread Sławomir Osipiuk via Unicode
, useless, or contradictory? Not at all. From: Mark Davis ☕️ [mailto:m...@macchiato.com] Sent: Wednesday, July 03, 2019 13:33 To: Sławomir Osipiuk Cc: verdy_p; unicode Unicode Discussion Subject: Re: Unicode "no-op" Character? Your goal is not achievable. We can't wave a

Re: Unicode "no-op" Character?

2019-07-03 Thread Mark Davis ☕️ via Unicode
Your goal is not achievable. We can't wave a magic wand, and suddenly (or even within decades) all processes everywhere ignore U+000F in all processing will not happen. This thread is pointless and should be terminated. Mark On Wed, Jul 3, 2019 at 5:48 PM Sławomir Osipiuk via Unicode < unic

RE: Unicode "no-op" Character?

2019-07-03 Thread Sławomir Osipiuk via Unicode
I’m frustrated at how badly you seem to be missing the point. There is nothing impossible nor self-contradictory here. There is only the matter that Unicode requires all scalar values to be preserved during interchange. This is in many ways a good idea, and I don’t expect it to change

Aw: Re: Unicode "no-op" Character?

2019-07-03 Thread Marius Spix via Unicode
nd is used for arbitrary length integers or other variable length structures where terminator characters like 0x00 may be part of the data.       Gesendet: Mittwoch, 03. Juli 2019 um 10:49 Uhr Von: "Philippe Verdy via Unicode" An: "Sławomir Osipiuk" Cc: "unicode Unicod

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
of any new character in Unicode. But if your protoclol does not allow any fom of escaping, then it is broken as it cannot transport **all** valid Unicode text. Le mer. 3 juil. 2019 à 10:49, Philippe Verdy a écrit : > Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a > écrit : > >>

Re: Unicode "no-op" Character?

2019-07-03 Thread Philippe Verdy via Unicode
Le mer. 3 juil. 2019 à 06:09, Sławomir Osipiuk a écrit : > I don’t think you understood me at all. I can packetize a string with any > character that is guaranteed not to appear in the text. > Your goal is **impossible** to reach with Unicode. Assume sich character is "add

RE: Unicode "no-op" Character?

2019-07-02 Thread Sławomir Osipiuk via Unicode
a tool like that would make some tasks much faster and simpler. Your proposed solution doesn’t. From: Philippe Verdy [mailto:verd...@wanadoo.fr] Sent: Saturday, June 29, 2019 15:47 To: Sławomir Osipiuk Cc: Shawn Steele; unicode Unicode Discussion Subject: Re: Unicode "no-op"

Re: Unicode "no-op" Character?

2019-06-29 Thread Philippe Verdy via Unicode
If you want to "packetize" arbitrarily long Unicode text, you don't need any new magic character. Just prepend your packet with a base character used as a syntaxic delimiter, that does not combine with what follows in any normalization. There's a fine character for that: the TAB contr

New control characters! (was: Re: Unicode "no-op" Character?)

2019-06-25 Thread Sławomir Osipiuk via Unicode
All right. Thanks to everyone who offered suggestions. I think the final choice will depend on the specific application, if I ever face this puzzle again. If nothing else, this discussion has helped me formulate what exactly it is I'm imagining, which is actually a bit different that was I

Re: Unicode "no-op" Character?

2019-06-24 Thread J Decker via Unicode
On Mon, Jun 24, 2019 at 5:35 PM David Starner via Unicode < unicode@unicode.org> wrote: > On Sun, Jun 23, 2019 at 10:41 PM Shawn Steele via Unicode > wrote: > > IMO, since it's unlikely that anyone expects > that they can transmit a NUL through an arbitrary channel, unlike a

Re: Unicode "no-op" Character?

2019-06-24 Thread David Starner via Unicode
On Sun, Jun 23, 2019 at 10:41 PM Shawn Steele via Unicode wrote: > Which leads us to the key. The desire is for a character that has no public > meaning, but has some sort of private meaning. In other words it has a > private use. Oddly enough, there is a group of characters

RE: Unicode "no-op" Character?

2019-06-24 Thread Sławomir Osipiuk via Unicode
eeds to be preserved and survive round-trip transmission (in fact the Unicode standard requires that). The second implies that it can be discarded. The first implies that it should be displayed to the user even if only as an "unknown something here". The second implies it should be ignored complet

RE: Unicode "no-op" Character?

2019-06-23 Thread Shawn Steele via Unicode
rfered with the processing of the string, they'd need to be stripped, but you're sort of already in that position by having a private flag in the middle of a string. -Shawn -Original Message- From: Unicode On Behalf Of Slawomir Osipiuk via Unicode Sent: Saturday, June 22, 2019 6:10

RE: Unicode "no-op" Character?

2019-06-23 Thread Sławomir Osipiuk via Unicode
Ah, sorry. I meant to say that the string should always be normalized (not "sanitized") before being checked for exploits (i.e. sanitized). -Original Message- From: Sławomir Osipiuk [mailto:sosip...@gmail.com] Sent: Sunday, June 23, 2019 20:28 To: unicode@unicode.org Cc

RE: Unicode "no-op" Character?

2019-06-23 Thread Sławomir Osipiuk via Unicode
ot really my specialty, but the approach described in the TR stinks horribly to me. And in my idea, noops would be stripped as part of string sanitization. But the more I consider it, the more I understand such a thing would have had to have be built into Unicode at the earliest stages. Basically, i

Re: Unicode "no-op" Character?

2019-06-23 Thread Richard Wordingham via Unicode
On Sat, 22 Jun 2019 21:10:08 -0400 Sławomir Osipiuk via Unicode wrote: > In fact, that might be the best description: It's not just an > "ignorable", it's a "discardable". Unicode doesn't have that, does it? No, though the byte order mark at the start of a file

Re: Unicode "no-op" Character?

2019-06-23 Thread Richard Wordingham via Unicode
On Sat, 22 Jun 2019 23:56:50 + Shawn Steele via Unicode wrote: > + the list. For some reason the list's reply header is confusing. > > From: Shawn Steele > Sent: Saturday, June 22, 2019 4:55 PM > To: Sławomir Osipiuk > Subject: RE: Unicode "no-op" Characte

RE: Unicode "no-op" Character?

2019-06-22 Thread Sławomir Osipiuk via Unicode
That's the key to the no-op idea. The no-op character could not ever be assumed to survive interchange with another process. It'd be canonically equivalent to the absence of character. It could be added or removed at any position by a Unicode-conformant process. A program could wipe all the no-ops

Re: Unicode "no-op" Character?

2019-06-22 Thread Richard Wordingham via Unicode
On Sat, 22 Jun 2019 23:56:11 + Shawn Steele via Unicode wrote: > Assuming you were using any of those characters as "markup", how > would you know when they were intentionally in the string and not > part of your marking system? If they're conveying an invisible mess

RE: Unicode "no-op" Character?

2019-06-22 Thread Sławomir Osipiuk via Unicode
spinning in my head, and now that I've been reading up a lot about Unicode and older standards like 2022/6429, it got me thinking whether there might already be an elegant solution. But, as an example I'm making up right now, imagine you want to packetize a large string. The packets are not all equal

Aw: Unicode "no-op" Character?

2019-06-22 Thread Marius Spix via Unicode
ot;Sławomir Osipiuk via Unicode" > An: unicode@unicode.org > Betreff: Unicode "no-op" Character? > > Does Unicode include a character that does nothing at all? I'm talking about > something that can be used for padding data without affecting interpretation > of

RE: Unicode "no-op" Character?

2019-06-22 Thread Shawn Steele via Unicode
Assuming you were using any of those characters as "markup", how would you know when they were intentionally in the string and not part of your marking system? -Original Message----- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: Saturday, June 22, 2019 4:17 PM T

RE: Unicode "no-op" Character?

2019-06-22 Thread Shawn Steele via Unicode
+ the list. For some reason the list's reply header is confusing. From: Shawn Steele Sent: Saturday, June 22, 2019 4:55 PM To: Sławomir Osipiuk Subject: RE: Unicode "no-op" Character? The original comment about putting it between the base character and the combining diacritic seem

Re: Unicode "no-op" Character?

2019-06-22 Thread Richard Wordingham via Unicode
On Sat, 22 Jun 2019 17:50:49 -0400 Sławomir Osipiuk via Unicode wrote: > If faced with the same problem today, I’d > probably just go with U+FEFF (really only need a single char, not a > whole delimited substring) or a different C0 control (maybe SI/LS0) > and clean up the string

RE: Unicode "no-op" Character?

2019-06-22 Thread Sławomir Osipiuk via Unicode
Indeed. There are plenty of control characters that seem useful, but they really aren’t, due to lack of support from common software. Unicode is deliberately silent about most of them, which is fair, but not always convenient. If faced with the same problem today, I’d probably just go with U

Re: Unicode "no-op" Character?

2019-06-22 Thread J Decker via Unicode
On Sat, Jun 22, 2019 at 2:04 PM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > I see there is no such character, which I pretty much expected after > Google didn’t help. > > > > The original problem I had was solved long ago but the recent article > abo

RE: Unicode "no-op" Character?

2019-06-22 Thread Sławomir Osipiuk via Unicode
I see there is no such character, which I pretty much expected after Google didn't help. The original problem I had was solved long ago but the recent article about watermarking reminded me of it, and my question was mostly out of curiosity. The task wasn't, strictly speaking, about "padding",

RE: Unicode "no-op" Character?

2019-06-22 Thread Doug Ewell via Unicode
Sławomir Osipiuk wrote: > Does Unicode include a character that does nothing at all? I'm talking > about something that can be used for padding data without affecting > interpretation of other characters, including combining chars and > ligatures. I join Shawn Steele in wondering wha

Re: Unicode "no-op" Character?

2019-06-22 Thread Rebecca T via Unicode
Perhaps a codepoint from a private use area and another processing step to add/ remove them would work for you? On Sat, Jun 22, 2019, 1:39 AM Mark Davis ☕️ via Unicode wrote: > There nothing like what you are describing. Examples: > >1. Display — There are a few of the Default I

Re: Unicode "no-op" Character?

2019-06-22 Thread Mark Davis ☕️ via Unicode
), but there is nothing that all processes will ignore. The only exception would be if some cooperating processes that had agreed beforehand to strip some particular character. Mark On Sat, Jun 22, 2019 at 6:49 AM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > Does Unicode include a

Re: Unicode "no-op" Character?

2019-06-22 Thread Alex Plantema via Unicode
Op zaterdag 22 juni 2019 02:14 schreef Sławomir Osipiuk via Unicode: Does Unicode include a character that does nothing at all? I'm talking about something that can be used for padding data without affecting interpretation of other characters, including combining chars and ligatures. I.e

Re: Unicode "no-op" Character?

2019-06-21 Thread J Decker via Unicode
Sounds like a great use for ZWNBSP (zero width non-breaking space) 0xFEFF (Also used as BOM) or that doesn't break; maybe 'ZERO WIDTH SPACE' (U+200B) On Fri, Jun 21, 2019 at 9:48 PM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > Does Unicode include a character that doe

RE: Unicode "no-op" Character?

2019-06-21 Thread Shawn Steele via Unicode
I'm curious what you'd use it for? From: Unicode On Behalf Of Slawomir Osipiuk via Unicode Sent: Friday, June 21, 2019 5:14 PM To: unicode@unicode.org Subject: Unicode "no-op" Character? Does Unicode include a character that does nothing at all? I'm talking about something that c

Unicode "no-op" Character?

2019-06-21 Thread Sławomir Osipiuk via Unicode
Does Unicode include a character that does nothing at all? I'm talking about something that can be used for padding data without affecting interpretation of other characters, including combining chars and ligatures. I.e. a character that could hypothetically be inserted between a latin E

Re: unicode tweet

2019-05-30 Thread Asmus Freytag via Unicode
On 5/30/2019 1:07 AM, Andre Schappo via Unicode wrote: This tweet made me laugh twitter.com/padolsey/status/1133835770773626881 勞 André Schappo

unicode tweet

2019-05-30 Thread Andre Schappo via Unicode
This tweet made me laugh twitter.com/padolsey/status/1133835770773626881 勞 André Schappo

Re: asking advice of the Unicode community on new character proposal

2019-05-03 Thread Richard Wordingham via Unicode
On Fri, 3 May 2019 11:01:33 +0300 Jack Rueter via Unicode wrote: > The additional Latin characters to be proposed include Latin capital > and small letters C, D, L, S, T and ɜ with descenders. They also > include a number of Cyrillic letters, capital and small Ukrainian IE > (in

asking advice of the Unicode community on new character proposal

2019-05-03 Thread Jack Rueter via Unicode
Hello! I am looking for advice from the Unicode community. I am working within the Finnish NB on a proposal for additional characters used to write the Komi-Permyak and Komi-Zyrian languages in Latin script in the 1930s (1932-1937 in Komi-Permyak (Latin alone) and 1932-1935 years in Komi

Unicode CLDR 35 beta available for testing

2019-03-18 Thread Rick McGowan via Unicode
The *beta* version of Unicode CLDR 35 <http://cldr.unicode.org/index/downloads/cldr-35> is available for testing. The final release is expected on March 27. Aside from documenting additional structure, there have been important modifications LDML (scan for the yellow highlighted se

Re: Unicode CLDR 35 alpha available for testing

2019-03-13 Thread Takao Fujiwara via Unicode
Thank you. On 2019/03/06 14:20, Mark Davis ☕️ via Unicode-san wrote: Just via svn checkout for the alpha. By next time we plan to be on GitHub... {phone} On Thu, Feb 28, 2019, 13:07 Doug Ewell via Unicode mailto:unicode@unicode.org>> wrote: announcements at unicode.org

Re: Unicode CLDR 35 alpha available for testing

2019-03-05 Thread Mark Davis ☕️ via Unicode
Just via svn checkout for the alpha. By next time we plan to be on GitHub... {phone} On Thu, Feb 28, 2019, 13:07 Doug Ewell via Unicode wrote: > announcements at unicode.org wrote: > > > The alpha version of Unicode CLDR 35 > > <http://cldr.unicode.org/index/downloads/

Re: Unicode CLDR 35 alpha available for testing

2019-02-28 Thread Doug Ewell via Unicode
announcements at unicode.org wrote: > The alpha version of Unicode CLDR 35 > <http://cldr.unicode.org/index/downloads/cldr-35> is available for > testing. No downloadable data files in the sense of released builds, correct? -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Unicode String Models

2018-11-22 Thread Henri Sivonen via Unicode
On Tue, Oct 2, 2018 at 3:04 PM Mark Davis ☕️ wrote: > > * The Python 3.3 model mentions the disadvantages of memory usage >> cliffs but doesn't mention the associated perfomance cliffs. It would >> be good to also mention that when a string manipulation causes the >> storage to expand or

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Steffen Nurpmeso via Unicode
Philippe Verdy via Unicode wrote in : |Padding itself does not clearly indicate the length. | |It's an artefact that **may** be infered only in some other layers \ |of protocols which specify when and how padding is needed (and how \ |many padding bytes |are required or accepted), it works

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Philippe Verdy via Unicode
g explicitly modified to suit an embedding protocol. > > And certainly the first sentence in this section isn’t intended to be > taken without the context of the rest of the section. > > > > tex > > > > > > > > *From:* Philippe Verdy [mailto:verd...@wanadoo.fr]

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-15 Thread Peter Saint-Andre via Unicode
On 10/14/18 3:59 PM, Philippe Verdy via Unicode wrote: > > > Le dim. 14 oct. 2018 à 21:21, Doug Ewell via Unicode > mailto:unicode@unicode.org>> a écrit : > > Steffen Nurpmeso wrote: > > > Base64 is defined in RFC 2045 (Multipurpose Internet Mail E

  1   2   3   4   5   6   7   8   9   10   >