Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Mark Davis ☕️
I think the term "non-ASCII Unicode" is just fine, and we don't need anything beyond that. It is clearly those Unicode characters that aren't (2) in http://unicode.org/glossary/#ASCII. Mark *— Il meglio è l’inimico del bene —* On Tue, Sep 29, 2015 at 6:20 PM,

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Daniel Bünzli
I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater than U+001F". This is the clearest you can come up with and anybody who has basic

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/21/2015 5:17 PM, Peter Constable wrote: If you think it's a serious problem that there isn't one conventional term for "characters outside the ASCII repertoire" or "UTF-8 multi-code-unit encoded representations" (since different authors could devise different terminology solutions), then

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Daniel Bünzli
Le mardi, 29 septembre 2015 à 21:03, Richard Wordingham a écrit : > Too wordy and clearly prone to error! Yes and maybe that "average engineer" does not understand negation. So clearly any of non-ASCII, non-Basic Latin or greater than U+007F cannot fit. Bring in the bureaucrats, new terminology

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Richard Wordingham
On Tue, 29 Sep 2015 20:27:28 +0100 Daniel Bünzli wrote: > Le mardi, 29 septembre 2015 à 19:50, Ken Whistler a écrit : > > I agree that "scalar values greater than U+007F" doesn't just trip > > off the tongue, and while technically accurate, it is bad > > terminology

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 12:27 PM, Daniel Bünzli wrote: Le mardi, 29 septembre 2015 à 19:50, Ken Whistler a écrit : I agree that "scalar values greater than U+007F" doesn't just trip off the tongue, and while technically accurate, it is bad terminology -- precisely because it begs the question "wtf are

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Asmus Freytag (t)
On 9/29/2015 8:40 PM, Sean Leonard wrote: I like the definition of "character" in ASCII: 3.3 Character. A member of a set of elements used for the organization, control, or representation of data. This, by the way, is the

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 11:50 AM, Ken Whistler wrote: On 9/29/2015 10:30 AM, Sean Leonard wrote: On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like,

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater than U+001F". This is the clearest you can

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Daniel Bünzli
Le mardi, 29 septembre 2015 à 18:30, Sean Leonard a écrit : > Uh...I think you mean U+007F? :) Yes… see how it was easy to point out that the definition was wrong. It would also have been, if this was code and we were talking about a protocol whose specification was using this notation rather

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Ken Whistler
On 9/29/2015 10:30 AM, Sean Leonard wrote: On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Richard Wordingham
On Tue, 29 Sep 2015 17:40:47 +0100 Daniel Bünzli wrote: > I would say there's already enough terminology in the Unicode world > to add more to it. This thread already hinted at enough ways of > expressing what you'd like, the simplest one being "scalar values >

Re: Concise term for non-ASCII Unicode characters

2015-09-28 Thread Sean Leonard
To follow up on this thread: It appears that ASCII is in fact a defined term in the Unicode glossary, and this term is sufficiently broad. http://unicode.org/glossary/#ASCII ASCII is sufficient to identify the range 0 - 127, whether that is simply a "range", "characters", "code points", or

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Richard Wordingham
On Tue, 22 Sep 2015 08:34:14 -0700 "Doug Ewell" wrote: > That's why I wrote "non Basic Latin." > > But I realize that not all fonts will show this clearly, and that the > distinction is lost in speech anyway. I think the difference is actually clearer in speech. Richard.

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/21/2015 9:24 PM, Janusz S. Bien wrote: Quote/Cytat - Sean Leonard (Mon 21 Sep 2015 10:51:42 PM CEST): Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the version that

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Richard Wordingham
On Sun, 20 Sep 2015 16:52:29 + Peter Constable wrote: > You already have been using "non-ASCII Unicode", which is about as > concise and sufficiently accurate as you'll get. There's no term > specifically defined in any standard or conventionally used for this. As to

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Philippe Verdy
I would not use the "clumsy 7-bit ASCII" due to the confusion created since long when it could refer to any national version of ISO 646, which reassign some code positions in the rande 0x00 to 0x07F to other characters outside the range U+ to U+007F, while still remaining 7-bit encodings. So

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/22/2015 2:27 AM, Sean Leonard wrote: Overall, the takeaway is that specifying ISO/IEC 646 / ECMA-6 is not sufficient; you need to include "IRV" as well, or ISO IR No. 6 for the G0 set and ISO IR No. 6 for the C0 set. ...which the Unicode Standard does specify, by stating "IRV" explicitly

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/22/2015 1:45 AM, Philippe Verdy wrote: I would not use the "clumsy 7-bit ASCII" due to the confusion created since long when it could refer to any national version of ISO 646, which reassign some code positions in the rande 0x00 to 0x07F to other characters outside the range U+ to

RE: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Peter Constable
, 2015 12:51 AM To: unicode@unicode.org Subject: Re: Concise term for non-ASCII Unicode characters On Sun, 20 Sep 2015 16:52:29 + Peter Constable <peter...@microsoft.com> wrote: > You already have been using "non-ASCII Unicode", which is about as > concise and suff

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Doug Ewell
Martin J. Dürst wrote: >> I was thinking that something like "non–Basic-Latin Unicode" might be > > Is that non-Basic Latin or not Basic-Latin? > >> useful. It avoids the confusion of referring to ASCII as a range of >> code points instead of a separate encoding standard. > > But as a

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Janusz S. Bien
Quote/Cytat - Sean Leonard (Mon 21 Sep 2015 10:51:42 PM CEST): Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the version that Unicode references? [...] I've never seen the

RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Tony Jollans
: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard Sent: 21 September 2015 09:22 To: unicode@unicode.org Subject: Re: Concise term for non-ASCII Unicode characters First of all, thank you all for the responses thus far. On 9/20/2015 5:51 PM, Martin J. Dürst wrote: > Hello Se

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Daniel Bünzli
Le lundi, 21 septembre 2015 à 09:22, Sean Leonard a écrit : > I think we can limit our inquiry to "characters" and "code points". Both > of those are well-defined in Unicode (see > ). I wouldn't say so. If you actually have a look at the definition for character

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Martin J. Dürst
Hello Doug, On 2015/09/22 00:42, Doug Ewell wrote: I was thinking that something like "non–Basic-Latin Unicode" might be Is that non-Basic Latin or not Basic-Latin? useful. It avoids the confusion of referring to ASCII as a range of code points instead of a separate encoding standard.

RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Peter Constable
Check here: http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS+4-1986%5bR2012%5d -Original Message- From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard Sent: Monday, September 21, 2015 1:52 PM To: unicode@unicode.org Subject: Re: Concise term for non-ASCII

RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Peter Constable
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean Leonard Sent: Monday, September 21, 2015 1:22 AM > Well what I am getting at is that when writing standards documents in various > SDOs (or any other > computer science text, for that matter), it is helpful to identify these >

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Richard Wordingham
On Mon, 21 Sep 2015 20:54:23 +0100 "Tony Jollans" wrote: > Windows code pages and their ilk predate Unicode, and I would only > ever expect to see them used in environments where legacy support is > needed, and would not expect a significant amount of new > documentation about

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Doug Ewell
Sean Leonard wrote: > Additionally as Peter stated, an expression including "Basic Latin > block" (e.g., characters beyond the Basic Latin block) could work. I was thinking that something like "non–Basic-Latin Unicode" might be useful. It avoids the confusion of referring to ASCII as a range of

RE: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Tony Jollans
@unicode.org Subject: Re: Concise term for non-ASCII Unicode characters On Mon, 21 Sep 2015 12:46:48 +0100 "Tony Jollans" <t...@jollans.com> wrote: > These days, it is pretty sloppy coding that cares how many bytes an > encoding of something requires, although there may be many

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Richard Wordingham
On Mon, 21 Sep 2015 12:46:48 +0100 "Tony Jollans" wrote: > These days, it is pretty sloppy coding that cares how many bytes an > encoding of something requires, although there may be many > circumstances where legacy support is required. Wow! Are you saying that code chopping

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Philippe Verdy
2015-09-21 21:54 GMT+02:00 Tony Jollans : > The actual octets are, of course, used in combinations, but not singly in > any way that requires them to be described in Unicode terms. Or am I > missing > something fundamental? > The term you are looking for are described in the

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Philippe Verdy
You actually don't need any copy to work with it U+ to U+007F are directly bound to US-ASCII. Unicode describe these characters with character properties (and representative glyphs only for the range U+0020..U+007E; the "C0" controls, in U+ to U+001F and U+007F, have a pseudo-glyph in

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Sean Leonard
Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the version that Unicode references? The Unicode Standard 8.0 refers to the following document: ANSI X3.4: American National Standards Institute. Coded

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Sean Leonard
First of all, thank you all for the responses thus far. On 9/20/2015 5:51 PM, Martin J. Dürst wrote: Hello Sean, On 2015/09/20 23:48, Sean Leonard wrote: What is the most concise term for characters or code points So we already have two different things we might need a term for. [...]

RE: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Phillips, Addison
de.org] On Behalf Of Peter > Constable > Sent: Sunday, September 20, 2015 9:52 AM > To: Sean Leonard; unicode@unicode.org > Subject: RE: Concise term for non-ASCII Unicode characters > > You already have been using "non-ASCII Unicode", which is about as concise > and sufficie

Re: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Martin J. Dürst
Hello Sean, On 2015/09/20 23:48, Sean Leonard wrote: What is the most concise term for characters or code points So we already have two different things we might need a term for. outside of the US-ASCII range (U+ - U+007F)? Sometimes I have referred to these as "extended characters"

Concise term for non-ASCII Unicode characters

2015-09-20 Thread Sean Leonard
What is the most concise term for characters or code points outside of the US-ASCII range (U+ - U+007F)? Sometimes I have referred to these as "extended characters" or "non-ASCII Unicode" but I do not find those terms precise. We are talking about the code points U+0080 - U+10. I

Re: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Steve Swales
--- >> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter >> Constable >> Sent: Sunday, September 20, 2015 9:52 AM >> To: Sean Leonard; unicode@unicode.org >> Subject: RE: Concise term for non-ASCII Unicode characters >> >> You already have be

RE: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Peter Constable
nicode.org] On Behalf Of Sean Leonard Sent: Sunday, September 20, 2015 7:48 AM To: unicode@unicode.org Subject: Concise term for non-ASCII Unicode characters What is the most concise term for characters or code points outside of the US-ASCII range (U+ - U+007F)? Sometimes I have referred to these as

RE: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Peter Constable
ve Swales [mailto:st...@swales.us] Sent: Sunday, September 20, 2015 11:00 AM To: Phillips, Addison <addi...@lab126.com> Cc: Peter Constable <peter...@microsoft.com>; Sean Leonard <lists+unic...@seantek.com>; unicode@unicode.org Subject: Re: Concise term for non-ASCII Unicode characters

Re: Concise term for non-ASCII Unicode characters

2015-09-20 Thread Daniel Bünzli
Le dimanche, 20 septembre 2015 à 18:59, Steve Swales a écrit : > Exactly. I think the reason that non-ASCII feels non-concise is that there is > widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is > widely confused with Windows-1252). For this reason I usually use the