Unicode education in the professional world

2017-07-07 Thread Doug Ewell via Unicode
Sort of along the lines of "education"... I've been helping a colleague who is using the Oracle database and trying to work through a customer's character conversion and mojibake issues. I started suspecting the NLS_LANG variable and looked up some references, and found the following alternative

Re: Unicode education in UK Schools

2017-07-07 Thread Doug Ewell via Unicode
Asmus Freytag wrote: > I've not (yet) located any assignments that try to address any of the > "tricky" issues in the use of Unicode. That might be a good thing. Many introductory lessons or chapters or talks about Unicode dive almost immediately into the complexities and weirdnesses, much more

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Doug Ewell via Unicode
Costello, Roger L. wrote: > Suppose an application splits a UTF-8 multi-octet sequence. The > application then sends the split sequence to a client. The client must > restore the original sequence. > > Question: is it possible to split a UTF-8 multi-octet sequence in such > a way that the client

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread Doug Ewell via Unicode
J Decker wrote: > I generally accepted any utf-8 encoding up to 31 bits though ( since > I was going from the original spec, and not what was effective limit > based on unicode codepoint space) Hey, everybody: Don't do that. UTF-8 has been constrained to the Unicode code space (maximum

Re: First bonafide use (≠ mention) of emoji by an academic publisher?

2017-07-23 Thread Doug Ewell via Unicode
Leonardo Boiko wrote: To my boundless, heartbreaking disappointment, these emojis are not U+1F4D8 BLUE BOOKs  from a custom @css font, but rather private-use U+F02Ds, which index a book glyph in some icon pack called Font Awesome . At least they're

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-07-03 Thread Doug Ewell via Unicode
a.lukyanov wrote: > Is it possible to design fonts that will render ẞ as SS? > > So we could choose between ẞ and SS by just selecting the proper font, > without changing the text itself. > > Or perhaps there will be a "font feature" to select this rendering > within the same font. I thought

If at first... (was: RE: CSUR and UCSUR glyphs)

2017-05-09 Thread Doug Ewell via Unicode
I wrote: > I was never able to find some of these scripts, such as Pikto http://unifoundry.com/pikto/index.html Never hurts to try again. -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: CSUR and UCSUR glyphs

2017-05-09 Thread Doug Ewell via Unicode
Michael Bear wrote: > Some of the glyphs were no problem, such as the Tengwar and Cirth > ones, because their pages actually show the glyphs on their pages. > > Others do not, which poses a bit of a problem. [...] As you probably read on both the CSUR and UCSUR sites, neither is sponsored or

Re: Human Rights translations

2017-05-10 Thread Doug Ewell via Unicode
Mats Blakstad wrote: > Who is at the moment organizing the human rights translations in > Unicode? How can we submit new translations? http://www.unicode.org/udhr/contributing.html -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Richard Wordingham wrote: >> It is not at all clear what the intent of the encoder was - or even >> if it's not just a problem with the data stream. E0 80 80 is not >> permitted, it's garbage. An encoder can't "intend" it. > > It was once a legal way of encoding NUL, just like C0 E0, which is >

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Richard Wordingham wrote: > So it was still a legal way for a non-UTF-8-compliant process! Anything is possible if you are non-compliant. You can encode U+263A with 9,786 FF bytes followed by a terminating FE byte and call that "UTF-8," if you are willing to be non-compliant enough. > Note for

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Hans Åberg wrote: >> Far from solving the stated problem, it would introduce a new one: >> conversion from the "bad data" Unicode code points, currently >> well-defined, would become ambiguous. > > Actually not: just translate the invalid UTF-8 sequences into invalid > UTF-32. Far from solving

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Hans Åberg wrote: > It would be useful, for use with filesystems, to have Unicode > codepoint markers that indicate how UTF-8, including non-valid > sequences, is translated into UTF-32 in a way that the original > octet sequence can be restored. I have always argued strongly against this idea,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Henri Sivonen wrote: > I find it shocking that the Unicode Consortium would change a > widely-implemented part of the standard (regardless of whether Unicode > itself officially designates it as a requirement or suggestion) on > such flimsy grounds. > > I'd like to register my feedback that I

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Doug Ewell via Unicode
Richard Wordingham wrote: I'm afraid I don't get the analogy. You can't build a full Unicode system out of Unicode-compliant parts. Others will have to address Richard's point about canonical-equivalent sequences. However, having dug out Unicode Version 2 Appendix A Section 2 UTF-8 (in

Re: 10.0 Code Charts

2017-06-22 Thread Doug Ewell via Unicode
Michael Bear wrote: > When are the code charts (http://www.unicode.org/charts/) going to be > updated for Unicode 10.0? They look fine to me. -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Petition to ban Google from designing emoji

2017-05-18 Thread Doug Ewell via Unicode
Asmus Freytag wrote: >> Given that one co-chair of the Emoji Subcommittee is from Apple and >> the other is from Google, you may wish to rethink your expectations >> about all this. > > I'd expect "zelpa" to feel validated by this info in their concern, > wouldn't you? Well, it's public

Re: Petition to ban Google from designing emoji

2017-05-18 Thread Doug Ewell via Unicode
zelpa wrote: > This is my real issue, Apple disregards guidelines, sets a de facto > standard, Google races to copy them. It's actually sad to see how far > other vendors will go to copy Apple's designs. I honestly think the > consortium should try harder to enforce the guidelines instead of >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Doug Ewell via Unicode
Asmus Freytag \(c\) wrote: > And why add a recommendation that changes that from completely up to > the implementation (or groups of implementations) to something where > one way of doing it now has to justify itself? A recommendation already exists, at the end of Section 3.9. The current

Team Emoji

2017-05-19 Thread Doug Ewell via Unicode
http://www.cnn.com/2017/05/19/us/emoji-redhead-curly-black-hair-trnd/index.html "Team Emoji (aka the Unicode Consortium) has approved some well-recieved [sic] updates to the visual lexicon we've all come to love. One of the most recent updates included black hearts and a unicorn, and they also

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Doug Ewell via Unicode
L2/17-168 says: "For UTF-8, recommend evaluating maximal subsequences based on the original structural definition of UTF-8, without ever restricting trail bytes to less than 80..BF. For example: is a single maximal subsequence because C0 was originally a lead byte for two-byte sequences." When

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-30 Thread Doug Ewell via Unicode
ormed UTF-8 On 05/30/2017 02:30 PM, Doug Ewell via Unicode wrote: > L2/17-168 says: > > "For UTF-8, recommend evaluating maximal subsequences based on the > original structural definition of UTF-8, without ever restricting trail > bytes to less than 80..BF. For example: is a s

Looking for 8-bit computer designers

2017-05-30 Thread Doug Ewell via Unicode
Not as OT as it might seem: If there are any engineers or designers on this list who worked on 8-bit and early 16-bit legacy computers (Apple II, Atari, Commodore, Tandy, etc.), and especially on character set design for these machines, please contact me privately at . Any desired degree of

Re: Encoding of character for new Japanese era name after Heisei

2017-06-02 Thread Doug Ewell via Unicode
> Anyway, since emperor Akihito (明仁), the era starting in 1989 is no > longer named after the emperor, but is Heisei (平成) "Peace everywhere". > This already occured in the past on the Ningo system. There's no > absolute requirement to change the era name even if there's a new > Emperor named. The

RE: Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-05 Thread Doug Ewell via Unicode
Martin J. Dürst wrote: > Assuming (conservatively) that it will take about a century to fill up > all 17 (well, actually 15, because two are private) planes, this would > give us another century. Current estimates seem to indicate that 800 years is closer to the mark. -- Doug Ewell | Thornton,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Doug Ewell via Unicode
Henri Sivonen wrote: > If anything, I hope this thread results in the establishment of a > requirement for proposals to come with proper research about what > multiple prominent implementations to about the subject matter of a > proposal concerning changes to text about implementation behavior.

Running out of code points, redux (was: Re: Feedback on the proposal...)

2017-06-01 Thread Doug Ewell via Unicode
Richard Wordingham wrote: > even supporting 6-byte patterns just in case 20.1 bits eventually turn > out not to be enough, Oh, gosh, here we go with this. What will we do if 31 bits turn out not to be enough? -- Doug Ewell | Thornton, CO, US | ewellic.org

RE: Looking for 8-bit computer designers

2017-06-14 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > These old platforms still have their fans which are easily found on > socail networks. [...] We know this. That's why a group of us is working on a proposal to add missing characters from these platforms. Some of the platforms have really obscure and hard-to-decipher

RE: L2/18-181

2018-05-17 Thread Doug Ewell via Unicode
I wrote: > ক্ is a conjunct consisting of three code points s/ক্/ক্ষ/ -- Doug Ewell | Thornton, CO, US | ewellic.org

RE: L2/18-181

2018-05-17 Thread Doug Ewell via Unicode
Everyone, I was not serious about this proposal being "fascinating" or in any way a model for what should happen with the Bengali script. Please imagine a tongue-in-cheek expression as you re-read my post. Maybe there is an emoji that depicts this. Maybe I've just been away from the list too

Re: L2/18-181

2018-05-17 Thread Doug Ewell via Unicode
Otto Stolz wrote: > I wonder how English and French ever could > be made to use a single script, let alone > German (???), Icelandic (???), Swedish (???), > Latvian (???), Chech (???) or ? you name it. They do use the same script, Latin. They do not use the same alphabet. Each language has its

L2/18-181

2018-05-16 Thread Doug Ewell via Unicode
http://www.unicode.org/L2/L2018/18181-n4947-assamese.pdf This is a fascinating proposal to disunify the Assamese script from Bengali on the following bases: 1. The identity of Assamese as a script distinct from Bengali is in jeopardy. 2. Collation is different between the Assamese and Bengali

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-28 Thread Doug Ewell via Unicode
SundaraRaman R wrote: but the very common pulli (VIRAMA) is neither in Lo nor has 'Other_Alphabetic', and so leads to concluding any string containing it to be non-alphabetic. Is this definition part of Unicode? I thought the use of General Category to answer questions like "this sequence is

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

2018-05-29 Thread Doug Ewell via Unicode
Richard Wordingham wrote: >>> The effects of virama that spring to mind are: >>> >>> (a) Causing one or both letters on either side to change or combine >>> to indicate combination; >>> >>> (b) Appearing as a mark only if it does not affect one of the >>> letters on either side; >>> >>> (c)

Re: Hyphenation Markup

2018-06-02 Thread Doug Ewell via Unicode
Richard Wordingham wrote: What about U+200B ZWSP? Thanks for the suggestion, but it's not likely to work: Are you asking what schemes exist, or are you trying to call attention to some rendering engine and/or font that doesn't render a combination as it should? 1) In the sequence

Re: Italic mu in squared Latin abbreviations?

2018-06-20 Thread Doug Ewell via Unicode
Ivan Panchenko wrote: > Is there a reason why the mu does not appear upright It was probably italicized in the glyphs printed in the relevant Japanese standard, back in the 1990s. The glyphs in the Unicode charts are not normative, except for a very small handful of encoded characters like

Re: Linearized tilde?

2017-12-30 Thread Doug Ewell via Unicode
David Starner wrote: "The letter is not included in any current spelling and is not included in Unicode." Should it be? Did anyone ever use the 1982 alphabet, other than Mann and Dalby? If not, I wonder if this letter is a bit like the "proposed new punctuation marks" that show up in

Re: Linearized tilde?

2017-12-30 Thread Doug Ewell via Unicode
Philippe Verdy wrote: Isn't it a rounded variant of Latin letter n ? Then it could exist also in uppercase form (like "n" and "N") A defining characteristic of the 1982 African Reference Alphabet was that it was lowercase-only. An uppercase form would be an invention with no basis in

Re: Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))

2018-01-15 Thread Doug Ewell via Unicode
On January 5, Mark Davis wrote: Doug, I modified my working draft, at https://docs.google.com/document/d/1EuNjbs0XrBwqlvCJxra44o3EVrwdBJUWsPf8Ec1fWKY If that looks ok, I'll submit. Sorry for the delay. The text substitutions look fine. -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: 0027, 02BC, 2019, or a new character?

2018-01-24 Thread Doug Ewell via Unicode
James Kass wrote: > Heh. We are offering sound advice. If people fail to heed it, that's > too bad. We're offering excellent advice, very well informed. But the leadership has made the decision that it has made. All the news stories say that linguistic experts in Kazakhstan offered similar good

Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > I agree, and still you won't necessarily have to press a dead key to > have these characters, if you map one key where the Cyrillic letter > was > producing directly the character with its accent. [...] > > However, if you can type one key to produce one latin letter with

Re: 0027, 02BC, 2019, or a new character?

2018-01-25 Thread Doug Ewell via Unicode
Philippe Verdy wrote: So there will be a new administrative jargon in Kazakhstan that people won't like, and outside the government, they'll continue using their exiosting keyboards [...] Newspapers and books will continue for a wihile being published in Cyrillic [...] Yes, it will be a

RE: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-29 Thread Doug Ewell via Unicode
Marcel Schneider wrote: > Prior to this thread, I believed that the ratio of Windows users > liking the US-International vs Mac users liking the US-Extended was > like other “Windows implementation” vs “Apple implementation” ratios. For many users, it may not be a question of what they like, but

Re: Keyboard layouts and CLDR

2018-01-30 Thread Doug Ewell via Unicode
Marcel Schneider wrote: > That tends to prove that Mac users accept changes, while Windows users > refuse changes. I was going to say that was a gross over-generalization, but that didn't adequately express how gross it was. It's just plain wrong. Pardon my bluntness. How about: Windows is

RE: Keyboard layouts and CLDR

2018-01-30 Thread Doug Ewell via Unicode
Marcel Schneider wrote: >> http://recycledknowledge.blogspot.com/2013/09/us-moby-latin-keyboard-for-windows.html > > Sadly the downloads are still unavailable (as formerly discussed). But > I saved in time, too (June 2015). Sorry, try this: http://vrici.lojban.org/~cowan/MobyLatinKeyboard.zip

Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-28 Thread Doug Ewell via Unicode
Marcel Schneider wrote: We can only hope that now, CLDR is thoroughly re-engineering the way international or otherwise extended keyboards are mapped. I suspect you already know this and just misspoke, but CLDR doesn't prescribe any vendor's keyboard layouts. CLDR mappings reflect what

Re: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-28 Thread Doug Ewell via Unicode
Mark Davis wrote: One addition: with the expansion of keyboards in http://blog.unicode.org/2018/01/unicode-ldml-keyboard-enhancements.html we are looking to expand the repository to not merely represent those, but to also serve as a resource that vendors can draw on. Would you say, then, that

RE: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-29 Thread Doug Ewell via Unicode
(b) it doesn't ship with Windows Of course that is not a "luxury." Knowing that third-party options are available, let alone free and easily installed ones, is the luxury. -- Doug Ewell | Thornton, CO, US | ewellic.org

+1 (was: Re: Why so much emoji nonsense?)

2018-02-15 Thread Doug Ewell via Unicode
Philippe Verdy wrote: If people don't know how to read and cannot reuse the content and transmit it, they become just consumers and in fact less and less productors or creators of contents. Just look at opinions under videos, most of them are just "thumbs up", "like", "+1", barely counted only,

Re: Unicode of Death 2.0

2018-02-17 Thread Doug Ewell via Unicode
Manish Goregaokar wrote: FWIW I dissected the crashing strings, it's basically all sequences in Telugu, Bengali, Devanagari where the consonant is suffix-joining (ra in Devanagari, jo and ro in Bengali, and all Telugu consonants), the vowel is not

Non-RGI sequences are not emoji? (was: Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10))

2018-01-02 Thread Doug Ewell via Unicode
Mark Davis wrote: BTW, relevant to this discussion is a proposal filed http://www.unicode.org/L2/L2017/17434-emoji-rejex-uts51-def.pdf (The date is wrong, should be 2017-12-22) The phrase "emoji regex" had caused me to ignore this document, but I took a look based on this thread. It says "we

RE: Private Use areas (was: Re: Thoughts on working with the Emoji Subcommittee (was ...))

2018-08-20 Thread Doug Ewell via Unicode
Mark Davis wrote: > The only caution I would give is that people shouldn't expect general > purpose software to do anything with PUA text that depends on > character properties. Very true, and a good point. People with creative PUA ideas do sometimes expect this to magically work. I have

Re: Private Use areas

2018-08-21 Thread Doug Ewell via Unicode
Ken Whistler wrote: > The way forward for folks who want to do this kind thing is: > > 1. Define a *protocol* for reliable interchange of custom character > property information about PUA code points. I've often thought that would be a great idea. You can't get to steps 2 and 3 without step 1.

Re: Private Use areas

2018-08-28 Thread Doug Ewell via Unicode
On August 23, 2011, Asmus Freytag wrote: > On 8/23/2011 7:22 AM, Doug Ewell wrote: >> Of all applications, a word processor or DTP application would want >> to know more about the properties of characters than just whether >> they are RTL. Line breaking, word breaking, and case mapping come to >>

Re: Unicode Digest, Vol 56, Issue 20

2018-08-30 Thread Doug Ewell via Unicode
UnicodeData.txt was devised long before any of the other UCD data files. Though it might seem like a simple enhancement to us, adding a header block, or even a single line, would break a lot of existing processes that were built long ago to parse this file. So Unicode can't add a header to this

Re: EOL conventions (was: Re: UCD in XML or in CSV? (is: UCD

2018-09-08 Thread Doug Ewell via Unicode
To finish (I hope) this thread: 1. Glad to know that Notepad is getting some modern updates, even if belatedly. 2. Sorry that there are still tools out there, on different platforms, that can't handle each other's EOL conventions. (Of course, this is the problem Unicode was trying to solve

Re: UCD in XML or in CSV? (is: UCD in YAML)

2018-09-06 Thread Doug Ewell via Unicode
Marcel Schneider wrote: > BTW what I conjectured about the role of line breaks is true for CSV > too, and any file downloaded from UCD on a semicolon separator basis > becomes unusable when displayed straight in the built-in text editor > of Windows, given Unicode uses Unix EOL. It's been well

Re: Unicode, emoji and Sundar Pichai

2018-07-13 Thread Doug Ewell via Unicode
Yuhong Bao wrote: > I wonder how much Sundar Pichai (CEO of Google) participate in Unicode > (especially the emoji part)? > Would he be interested in Unicode UTC meetings for example? Google currently has a representative on the Unicode Board of Directors (Bob Jung), the Unicode Consortium

SignWriting in U+40000 block

2018-01-22 Thread Doug Ewell via Unicode
The IETF is noting the progress of an updated draft: Formal SignWriting draft-slevinski-formal-signwriting-04 https://tools.ietf.org/html/draft-slevinski-formal-signwriting-04.html which continues to describe an implementation of SignWriting in the as-yet unassigned Plane 4, including a detailed

Re: 0027, 02BC, 2019, or a new character?

2018-01-23 Thread Doug Ewell via Unicode
I think it's so cute that some of us think we can advise Nazarbayev on whether to use straight or curly apostrophes or accents or x's or whatever. Like he would listen to a bunch of Western technocrats. An explicitly stated goal of the new orthography was to enable typing Kazakh on a "standard

RE: 0027, 02BC, 2019, or a new character?

2018-01-23 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > The best they should have done is instead keeping their existing > keyboard layout, continaing both the Cyrillic letters and Latin QWERTY > printed on them, but operating in two modes (depending on OS > preferences) to invert the two layouts but without changing the >

Re: base1024 encoding using Unicode emojis

2018-03-11 Thread Doug Ewell via Unicode
Oh, let him have a little fun. At least he's using emoji for something related to characters, instead of playing Mr. Potato Head. Incidentally, more prior art on large-base encoding: https://sites.google.com/site/markusicu/unicode/base16k -- Doug Ewell | Thornton, CO, US | ewellic.org

Missing Kazakh Latin letters (was: Re: 0027, 02BC, 2019, or a new character?)

2018-02-27 Thread Doug Ewell via Unicode
Michael Everson wrote: > Why on earth would they use Ch and Sh when 1) C isn’t used by itself > and 2) if you’re using Ǵǵ you may as well use Çç Şş. Philippe Verdy wrote: > The three versions of the Cyrilic letter i is mapped to 1.5 > (distinguished only on lowercase with the Turkic lowercase

RE: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread Doug Ewell via Unicode
Martin J. Dürst wrote: > Please enjoy. Sorry for being late with forwarding, at least in some > parts of the world. Unfortunately, we know some folks will look past the humor and use this as a springboard for the recurring theme "Yes, what *will* we do when Unicode runs out of code points?" I

RE: Unicode Emoji 11.0 characters now ready for adoption!

2018-03-01 Thread Doug Ewell via Unicode
Tim Partridge wrote: > Perhaps the CLDR work the Consortium does is being referenced. That is > by language on this list > http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ee > By the time it gets to the 100th entry the Modern percentage has "room > for improvement". I

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread Doug Ewell via Unicode
J Decker wrote: >> How about the opposite direction: If m is base64 encoded to yield t >> and then t is base64 decoded to yield n, will it always be the case >> that m equals n? > > False. > Canonical translation may occur which the different base64 may be the > same sort of string... Base64 is

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-14 Thread Doug Ewell via Unicode
Steffen Nurpmeso wrote: Base64 is defined in RFC 2045 (Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies). Base64 is defined in RFC 4648, "The Base16, Base32, and Base64 Data Encodings." RFC 2045 defines a particular implementation of base64, specific

Re: A sign/abbreviation for "magister"

2018-10-29 Thread Doug Ewell via Unicode
Richard Wordingham wrote: >> I like palaeographic renderings of text very much indeed, and in fact >> remain in conflict with members of the UTC (who still, alas, do NOT >> communicate directly about such matters, but only in duelling ballot >> comments) about some actually salient

Re: A sign/abbreviation for "magister"

2018-10-30 Thread Doug Ewell via Unicode
Julian Bradfield wrote: >> in the 17ᵗʰ or 18ᵗʰ century to keep it only for ordinals. Should >> Unicode > > What do you mean, for ordinals? If you mean 1st, 2nd etc., then there > is not now (when superscripting looks very old-fashioned) and never > has been any requirement to superscript them,

[getting OT] Re: A sign/abbreviation for "magister"

2018-10-30 Thread Doug Ewell via Unicode
Marcel Schneider replied to Khaled Hosny: >>> E.g. in Arabic script, superscript is considered worth encoding and >>> using without any caveat, [...] >> >> Curious, what Arabic superscripts are encoded in Unicode? > > [...] There is the range U+FC5E..U+FC63 (presentation forms). Arabic

Re: A sign/abbreviation for "magister"

2018-11-02 Thread Doug Ewell via Unicode
Michael Everson wrote: > I write my 7’s and Z’s with a horizontal line through them. Ƶ is > encoded not for this purpose, but because Z and Ƶ are distinct in > orthographies for varieties of Tatar, Chechen, Karelian, and > Mongolian. This is a contemporary writing convention but it does not >

Re: A sign/abbreviation for "magister"

2018-11-02 Thread Doug Ewell via Unicode
Do we have any other evidence of this usage, besides a single handwritten postcard? -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding (was: Re: A sign/abbreviation for "magister")

2018-11-05 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > Note that I actually propose not just one rendering for the abbrevaition mark> but two possible variants (that would be equally > valid withou preference). Actually you're not proposing them. You're talking about them (at length) on the public mailing list. If you want

Re: Encoding italic

2019-01-21 Thread Doug Ewell via Unicode
James Kass wrote: > Even the enthusiasts among us seldom take the trouble to include > ‘proper’ quotes and apostrophes in e-mails — even for posting to > specialized lists such as this one where other members might notice > and appreciate the extra effort involved. Well, definitely not to this

Re: Encoding italic (was: A last missing link)

2019-01-21 Thread Doug Ewell via Unicode
Kent Karlsson wrote: > There is already a standardised, "character level" (well, it is from > a character standard, though a more modern view would be that it is > a higher level protocol) way of specifying italics (and bold, and > underline, and more): > > \u001b[3mbla bla bla\u001b[0m > >

Re: The encoding of the Welsh flag

2018-11-22 Thread Doug Ewell via Unicode
Ken Whistler replied to Michael Everson: What really annoys me about this is that there is no flag for Northern Ireland. The folks at CLDR did not think to ask either the UK or the Irish representatives to SC2 about this. [...] If you or Andrew West or anyone else is interested in pursuing

Re: The encoding of the Welsh flag

2018-11-22 Thread Doug Ewell via Unicode
Christoph Päper wrote: We have gotten requests for this, but the stumbling block is the lack of an official N. Ireland document describing what the official flag is and should look like. Such documents are lacking for several of the RIS flag emojis as well, though, e.g. for  from ISO 3166-1

Re: Where is my character @?

2019-01-09 Thread Doug Ewell via Unicode
James Kass wrote: > It's probably old-fashioned to say that technology should be forced to > accomodate people rather than the other way around. But it's good to > note that efforts are still being made on behalf of the users to make > progress towards U.C.S. inclusion. I'm as opposed to this

Re: Unihan variants information

2019-01-28 Thread Doug Ewell via Unicode
Michel MARIANI wrote: > I've developped an open-source, multi-platform desktop application > called Unicode Plus Before you get too heavily invested in this product name, you may want to: 1. check out the page "Unicode® Copyright and Terms of Use" located at

Re: Unicode CLDR 35 alpha available for testing

2019-02-28 Thread Doug Ewell via Unicode
announcements at unicode.org wrote: > The alpha version of Unicode CLDR 35 > is available for > testing. No downloadable data files in the sense of released builds, correct? -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Encoding italic

2019-02-08 Thread Doug Ewell via Unicode
I'd like to propose encoding italics and similar display attributes in plain text using the following stateful mechanism: • Italics on: ESC [3m • Italics off: ESC [23m • Bold on: ESC [1m • Bold off: ESC [22m • Underline on: ESC [4m • Underline off: ESC [24m •

Re: Encoding italic

2019-02-10 Thread Doug Ewell via Unicode
Egmont Koblinger wrote: > There are a lot of problems with these escape sequences, and if you go > for a potentially new standard, you might not want to carry these > problems. As others have pointed out, I am suggesting the use of some profile of ISO 6429 within plain text to implement these

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode
Kent Karlsson wrote: > We already have a well-established standard for doing this kind of > things... I thought we were having this discussion because none of the existing methods, no matter how well documented, has been accepted on a widespread basis as "the" standard. Some people dislike

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode
Philippe Verdy replied to James Kass: > You're not very explicit about the Tag encoding you use for these > styles. Of course, it was Andrew West who implemented the styling mechanism in a beta release of BabelPad. James was just reporting on it. > And what is then the interest compared to

Re: Encoding italic

2019-01-29 Thread Doug Ewell via Unicode
Martin J. Dürst wrote: > Here's a little dirty secret about these tag characters: They were > placed in one of the astral planes explicitly to make sure they'd use > 4 bytes per tag character, and thus quite a few bytes for any actual > complete tags. See https://tools.ietf.org/html/rfc2482 for

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode
Martin J. Dürst wrote: > Here's a little dirty secret about these tag characters: They were > placed in one of the astral planes explicitly to make sure they'd use > 4 bytes per tag character, and thus quite a few bytes for any actual > complete tags. Aha. That explains why SCSU had to be

Re: Encoding italic

2019-01-30 Thread Doug Ewell via Unicode
Kent Karlsson wrote: > Yes, great. But as I've said, we've ALREADY got a > default-ignorable-in-display (if implemented right) > way of doing such things. > > And not only do we already have one, but it is also > standardised in multiple standards from different > standards institutions. See for

Re: Proposal for BiDi in terminal emulators

2019-02-01 Thread Doug Ewell via Unicode
Richard Wordingham wrote: > Language tagging is already available in Unicode, via the tag > characters in the deprecated plane. Plane 14 isn't deprecated -- that isn't a property of planes -- and the tag characters U+E0020 through U+E007E have been un-deprecated for use with emoji flags. Only

Use of tag characters in emoji sequences (was: Re: Proposal for BiDi in terminal emulators)

2019-02-02 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > Actually not all U+E0020 through U+E007E are "un-deprecated" for this > use. Characters in Unicode are not "deprecated" for some purposes and not for others. "Deprecated" is a clearly defined property in Unicode. The only reference that matters here is in PropList.txt:

Re: Proposal for BiDi in terminal emulators

2019-02-02 Thread Doug Ewell via Unicode
Richard Wordingham wrote: > Unicode may not deprecate the tag characters, but the characters of > Plane 14 are widely deplored, despised or abhorred. That is why I > think of it as the deprecated plane. Think of it as the deplored plane, then, or the despised plane or the abhorred plane or the

Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Doug Ewell via Unicode
Egmont Koblinger wrote: > "Basic Arabic shaping, at the level of a typewriter, is > straightforward enough to be implemented in the application, using > presentation form characters, as I suggest". Could you please point > out the problems with this statement? As multiple people have pointed

RE: Encoding italic

2019-01-31 Thread Doug Ewell via Unicode
Kent Karlsson wrote: > ITU T.416/ISO/IEC 8613-6 defines general RGB & CMY(K) colour control > sequences, which are deferred in ECMA-48/ISO 6429. (The RGB one > is implemented in Cygwin (sorry for mentioning a product name).) Fair enough. This thread is mostly about italics and bold and such,

Re: Does "endian-ness" apply to UTF-8 characters that use multiple bytes?

2019-02-04 Thread Doug Ewell via Unicode
http://www.unicode.org/faq/utf_bom.html#utf8-2 -- Doug Ewell | Thornton, CO, US | ewellic.org

Re: Proposal to extend the U+1F4A9 Symbol

2019-06-01 Thread Doug Ewell via Unicode
bristol_poo wrote: > This would produce 7 variants of the U+1F4A9 emoji, including existing > (Which I believe is about Type 4 on the scale). > > Why? I think this would really benefit the medical profession, with a > large uptick in e-doctor/text only conversations towards the medical >

RE: Proposal to extend the U+1F4A9 Symbol

2019-06-01 Thread Doug Ewell via Unicode
Andrew West wrote: > oh, there is no Wikidata QID for phone dropped in the toilet. It's Wikidata, right? Pretty much anyone can create an item for pretty much anything, right? Problem solved. -- Doug Ewell | Thornton, CO, US | ewellic.org

RE: Proposal to extend the U+1F4A9 Symbol

2019-06-01 Thread Doug Ewell via Unicode
Tex wrote: > What I would find useful is an emoji for when my phone falls into the > toilet. I would have thought ⤵ would be sufficient. But I didn't include any variation selectors and combining sequences for the gender, skin color, hair style, profession, and current state of mind of the

Format A

2019-05-30 Thread Doug Ewell via Unicode
Apologies if this is a repeat of a (much) earlier inquiry. The mapping tables that are available as part of the Unicode Standard (http://www.unicode.org/Public/MAPPINGS/) are generally provided in a text format called "Format A." Each line in the file defines a mapping between a character in a

RE: Unicode "no-op" Character?

2019-06-22 Thread Doug Ewell via Unicode
Sławomir Osipiuk wrote: > Does Unicode include a character that does nothing at all? I'm talking > about something that can be used for padding data without affecting > interpretation of other characters, including combining chars and > ligatures. I join Shawn Steele in wondering what your "data

Re: Is ARMENIAN ABBREVIATION MARK (՟, U+055F) misclassified?

2019-04-26 Thread Doug Ewell via Unicode
Fredrick Brennan wrote: > Although my research on this has by no means been exhaustive, it > seems at a cursory glance that the «pativ», the Armenian abbreviation > mark, is misclassified; it seems it should either be itself a > combining mark or have a combining mark version. > > I have not been

Re: Symbols of colors used in Portugal for transport

2019-04-29 Thread Doug Ewell via Unicode
Philippe Verdy wrote: > A very useful think to add to Unicode (for colorblind people) ! > > http://bestinportugal.com/color-add-project-brings-color-identification-to-the-color-blind > > Is it proposed to add as new symbols ? Well, it isn't proposed until someone proposes it. At first I

  1   2   >