C1 Control Pictures Proposal

2011-08-13 Thread Sean Leonard
Control Pictures to Unicode. It is being proposed by me, Sean Leonard, with the advice and +1 of Frank da Cruz. Many years ago (in 1998), Frank da Cruz proposed a large number of additional characters for terminal emulation and the like, which can be found on the web and in the mail list

Re: C1 Control Pictures Proposal

2011-08-17 Thread Sean Leonard
On Aug 13, 2011, at 10:48 AM, Sean Leonard wrote: Greetings--hi all, I'm a new poster. I read on the unicode.org website that a good way to gauge interest and get a proposal through the process is to gather feedback and comments here before investing the time in a formal proposal, so, here

Re: C1 Control Pictures Proposal

2011-08-21 Thread Sean Leonard
Hi Ken et. al., On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote: Further comments: On 8/13/2011 10:48 AM, Sean Leonard wrote: In accordance with this and other text in the Standard, it is not really possible to assign glyphs uniformly and interchangeably to the code points in U+-U

Re: C1 Control Pictures Proposal

2011-08-22 Thread Sean Leonard
On Aug 17, 2011, at 4:38 PM, Andrew West wrote: Unless you can show evidence that C1 control pictures are currently in use and that there is a clear demand from the user community to On Aug 21, 2011, at 10:13 AM, Doug Ewell wrote: Perhaps it would help for you to do a quick survey of

Re: A Bulldog moves on

2015-10-24 Thread Sean Leonard
A very sad day in the history of this community. I learned a lot about Unicode, and about internationalization and localization on Windows, directly through his posts. And now, having done a bit of research, it looks like he left the Internet a gift with some recent blog posts about quite a

Re: Why Nothing Ever Goes Away

2015-10-09 Thread Sean Leonard
:24 GMT+02:00 Sean Leonard <lists+unic...@seantek.com <mailto:lists+unic...@seantek.com>>: 2. The Unicode code charts are (deliberately) vague about U+0080, U+0081, and U+0099. All other C1 control codes have aliases to the ISO 6429 se

Pictorial Representations of BS and DEL

2015-10-09 Thread Sean Leonard
Hello: As we continue to riff on the history of character encodings, I am searching for the most accurate standards-based pictorial representations of BS (U+0008) and DEL (U+007F) in Unicode. ECMA-17:1968 and ANSI X3.32-1973 depict U+0008 as an arrow pointing from the bottom-right to the

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/21/2015 5:17 PM, Peter Constable wrote: If you think it's a serious problem that there isn't one conventional term for "characters outside the ASCII repertoire" or "UTF-8 multi-code-unit encoded representations" (since different authors could devise different terminology solutions), then

Re: Concise term for non-ASCII Unicode characters

2015-09-28 Thread Sean Leonard
To follow up on this thread: It appears that ASCII is in fact a defined term in the Unicode glossary, and this term is sufficiently broad. http://unicode.org/glossary/#ASCII ASCII is sufficient to identify the range 0 - 127, whether that is simply a "range", "characters", "code points", or

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 12:27 PM, Daniel Bünzli wrote: Le mardi, 29 septembre 2015 à 19:50, Ken Whistler a écrit : I agree that "scalar values greater than U+007F" doesn't just trip off the tongue, and while technically accurate, it is bad terminology -- precisely because it begs the question "wtf are

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 11:50 AM, Ken Whistler wrote: On 9/29/2015 10:30 AM, Sean Leonard wrote: On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like

Re: Acquiring DIS 10646

2015-10-03 Thread Sean Leonard
Thanks. Well, "DIS 10646" is the Draft International Standard, particularly Draft 1, from ~1990 or ~1991. (Sometimes it might have been called 10646.1.) Therefore it would likely only be in print form (or printed and scanned form). It's pretty old. What I understand is that Draft 1 got shot

Re: Acquiring DIS 10646

2015-10-04 Thread Sean Leonard
On 10/3/2015 12:28 PM, Asmus Freytag (t) wrote: On 10/3/2015 8:15 AM, Sean Leonard wrote: Thanks. Well, "DIS 10646" is the Draft International Standard, particularly Draft 1, from ~1990 or ~1991. (Sometimes it might have been called 10646.1.) Therefore it would likely only be in

Acquiring DIS 10646

2015-10-02 Thread Sean Leonard
As part of yet more research, I would like to get a hold of DIS 10646, aka Draft International Standard ISO/IEC 10646.1 (circa 1990 or 1991). I understand that Draft 2 (10646.2) was accepted and therefore became ISO/IEC 10646-1:1993. Therefore, I am looking for a copy (preferably free,

Re: Concise term for non-ASCII Unicode characters

2015-09-29 Thread Sean Leonard
On 9/29/2015 9:40 AM, Daniel Bünzli wrote: I would say there's already enough terminology in the Unicode world to add more to it. This thread already hinted at enough ways of expressing what you'd like, the simplest one being "scalar values greater than U+001F". This is the clearest you can

Concise term for non-ASCII Unicode characters

2015-09-20 Thread Sean Leonard
What is the most concise term for characters or code points outside of the US-ASCII range (U+ - U+007F)? Sometimes I have referred to these as "extended characters" or "non-ASCII Unicode" but I do not find those terms precise. We are talking about the code points U+0080 - U+10. I

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/21/2015 9:24 PM, Janusz S. Bien wrote: Quote/Cytat - Sean Leonard <lists+unic...@seantek.com> (Mon 21 Sep 2015 10:51:42 PM CEST): Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the v

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/22/2015 2:27 AM, Sean Leonard wrote: Overall, the takeaway is that specifying ISO/IEC 646 / ECMA-6 is not sufficient; you need to include "IRV" as well, or ISO IR No. 6 for the G0 set and ISO IR No. 6 for the C0 set. ...which the Unicode Standard does specify, by stating "

Re: Concise term for non-ASCII Unicode characters

2015-09-22 Thread Sean Leonard
On 9/22/2015 1:45 AM, Philippe Verdy wrote: I would not use the "clumsy 7-bit ASCII" due to the confusion created since long when it could refer to any national version of ISO 646, which reassign some code positions in the rande 0x00 to 0x07F to other characters outside the range U+ to

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Sean Leonard
Related question as I am researching this: How can I acquire (cheaply or free) the latest and most official copy of US-ASCII, namely, the version that Unicode references? The Unicode Standard 8.0 refers to the following document: ANSI X3.4: American National Standards Institute. Coded

Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Sean Leonard
First of all, thank you all for the responses thus far. On 9/20/2015 5:51 PM, Martin J. Dürst wrote: Hello Sean, On 2015/09/20 23:48, Sean Leonard wrote: What is the most concise term for characters or code points So we already have two different things we might need a term

Re: Why Nothing Ever Goes Away

2015-10-06 Thread Sean Leonard
2. The Unicode code charts are (deliberately) vague about U+0080, U+0081, and U+0099. All other C1 control codes have aliases to the ISO 6429 set of control functions, but in ISO 6429, those three control codes don't have any assigned functions (or names). On 10/5/2015 3:57 PM, Philippe Verdy

Unicode password mapping for crypto standard

2016-01-04 Thread Sean Leonard
Hi Unicode list, I am looking for feedback on this proposal, specifically a standard specification to map between (presumably) Unicode text strings and octet strings. A "password" is defined as an arbitrary octet string in a number of protocols and formats. This has worked for basic cases

Re: Unicode password mapping for crypto standard

2016-01-09 Thread Sean Leonard
On 1/5/2016 8:37 AM, Stephane Bortzmeyer wrote: On Mon, Jan 04, 2016 at 09:30:32PM -0800, Sean Leonard <lists+unic...@seantek.com> wrote a message of 120 lines which said: how to take the Unicode input and get a consistent and reasonable stream of bits out on both ends. For example:

Re: Unicode password mapping for crypto standard

2016-01-09 Thread Sean Leonard
On 1/5/2016 8:26 AM, Markus Scherer wrote: I would specify that UTF-8 must be used, without mapping. US-ASCII is a proper subset, so need not be mentioned explicitly, nor distinguished in the protocol. Mappings would require that all implementations carry relevant data, and are up to date to

U+hhhh[h[h]] NAME syntax

2016-08-13 Thread Sean Leonard
It appears that U+[h[h]] NAME syntax is a very common--one might say "standard"--way of representing a particular Unicode character or code point in text. It is the way that the Unicode Standard 9.0.0 refers to particular characters, and I have seen it around quite a bit. The Unicode

Re: Whitespace characters in Unicode

2016-08-04 Thread Sean Leonard
taking a look at the definitions used by Unicode regexpes, at http://unicode.org/reports/tr18/ . 2016-08-04 16:37 GMT-03:00 Sean Leonard <lists+unic...@seantek.com <mailto:lists+unic...@seantek.com>>: Hi Unicode Folks: I am trying to come up with a sensible sets o

Whitespace characters in Unicode

2016-08-04 Thread Sean Leonard
Hi Unicode Folks: I am trying to come up with a sensible sets of characters that are considered whitespace or newlines in Unicode, and to understand the relative stability policy with respect to them. (This is for a formal syntax where the definition of "whitespace" matters, e.g., to separate

Re: Whitespace characters in Unicode

2016-08-07 Thread Sean Leonard
On 8/5/2016 10:07 AM, Markus Scherer wrote: On Fri, Aug 5, 2016 at 8:52 AM, Sean Leonard <lists+unic...@seantek.com <mailto:lists+unic...@seantek.com>> wrote: What makes a character a "whitespace" in Unicode, e.g., why are ZWSP and ZWNBSP not "whitespace&qu