Re: Emoji Space

2017-07-17 Thread Alastair Houghton via Unicode
On 17 Jul 2017, at 13:25, Christoph Päper via Unicode wrote: > > Finally, should smart fonts make U+0020 exactly as wide as an em when between > two emojis? I’ll leave it to others to answer the rest (I don’t know the answers to those), but the answer to this is clearly

Re: LATIN CAPITAL LETTER SHARP S officially recognized

2017-07-03 Thread Alastair Houghton via Unicode
On 2 Jul 2017, at 16:59, Jörg Knappen via Unicode wrote: > > > Is it possible to design fonts that will render ẞ as SS? > > In fact, that has happened long before the capital letter sharp s was added > to Unicode: The T1 encoding (aka Cork encoding) of LaTeX > does this

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Alastair Houghton via Unicode
On 15 May 2017, at 18:52, Asmus Freytag <asm...@ix.netcom.com> wrote: > > On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote: >> On 15 May 2017, at 11:21, Henri Sivonen via Unicode <unicode@unicode.org> >> wrote: >>> In reference to: >>

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Alastair Houghton via Unicode
On 18 May 2017, at 01:04, Philippe Verdy via Unicode wrote: > > I find intriguating that the update intends to enforce the decoding of the > **shortest** sequences, but now wants to treat **maximal sequences** as a > single unit with arbitrary length. UTF-8 was designed

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Alastair Houghton via Unicode
On 18 May 2017, at 06:01, Richard Wordingham via Unicode wrote: > > On Thu, 18 May 2017 02:04:55 +0200 > Philippe Verdy via Unicode wrote: > >> I find intriguating that the update intends to enforce the decoding >> of the **shortest** sequences, but

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-18 Thread Alastair Houghton via Unicode
On 18 May 2017, at 07:18, Henri Sivonen via Unicode wrote: > > the decision complicates U+FFFD generation when validating UTF-8 by state > machine. It *really* doesn’t. Even if you’re hell bent on using a pure state machine approach, you need to add maybe two additional

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 14:23, Hans Åberg via Unicode wrote: > > You don't. You have a filename, which is a octet sequence of unknown > encoding, and want to deal with it. Therefore, valid Unicode transformations > of the filename may result in that is is not being reachable. >

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 16:44, Hans Åberg <haber...@telia.com> wrote: > > On 16 May 2017, at 17:30, Alastair Houghton via Unicode <unicode@unicode.org> > wrote: >> >> HFS(+), NTFS and VFAT long filenames are all encoded in some variation on >> UCS-2/UT

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 17:07, Hans Åberg wrote: > HFS(+), NTFS and VFAT long filenames are all encoded in some variation on UCS-2/UTF-16. ... >>> >>> The filesystem directory is using octet sequences and does not bother >>> passing over an encoding, I am told.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-17 Thread Alastair Houghton via Unicode
> On 16 May 2017, at 20:43, Richard Wordingham via Unicode > wrote: > > On Tue, 16 May 2017 11:36:39 -0700 > Markus Scherer via Unicode wrote: > >> Why do we care how we carve up an illegal sequence into subsequences? >> Only for debugging and visual

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 17:23, Hans Åberg wrote: > > HFS implements case insensitivity in a layer above the filesystem raw > functions. So it is perfectly possible to have files that differ by case only > in the same directory by using low level function calls. The Tenon MachTen

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Alastair Houghton via Unicode
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote: > > In reference to: > http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf > > I think Unicode should not adopt the proposed change. Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
> On 16 May 2017, at 09:18, David Starner wrote: > > On Tue, May 16, 2017 at 12:42 AM Alastair Houghton > wrote: >> If you’re about to mutter something about security, consider this: security >> code *should* refuse to compare strings that

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
> On 16 May 2017, at 10:29, David Starner wrote: > > On Tue, May 16, 2017 at 1:45 AM Alastair Houghton > wrote: > That’s true anyway; imagine the database holds raw bytes, that just happen to > decode to U+FFFD. There might seem to be

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 09:31, Henri Sivonen via Unicode wrote: > > On Tue, May 16, 2017 at 10:42 AM, Alastair Houghton > wrote: >> That would be true if the in-memory representation had any effect on what >> we’re talking about, but it really

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 08:22, Asmus Freytag via Unicode wrote: > I therefore think that Henri has a point when he's concerned about tacit > assumptions favoring one memory representation over another, but I think the > way he raises this point is needlessly antagonistic. That

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 15 May 2017, at 23:16, Shawn Steele via Unicode wrote: > > I’m not sure how the discussion of “which is better” relates to the > discussion of ill-formed UTF-8 at all. It doesn’t, which is a point I made in my original reply to Henry. The only reason I answered his

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 15 May 2017, at 23:43, Richard Wordingham via Unicode wrote: > > The problem with surrogates is inadequate testing. They're sufficiently > rare for many users that it may be a long time before an error is > discovered. It's not always obvious that code is designed for

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Alastair Houghton via Unicode
> On 23 May 2017, at 18:45, Markus Scherer via Unicode > wrote: > > On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode > wrote: >> So, if the proposal for Unicode really was more of a "feels right" and not a >> "deviate at your peril"

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-23 Thread Alastair Houghton via Unicode
On 23 May 2017, at 07:10, Jonathan Coxhead via Unicode <unicode@unicode.org> wrote: > > On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote: >> On 18 May 2017, at 07:18, Henri Sivonen via Unicode <unicode@unicode.org> >> wrote: >> >>> th

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread Alastair Houghton via Unicode
On 16 May 2017, at 19:36, Markus Scherer wrote: > > Let me try to address some of the issues raised here. Thanks for jumping in. The one thing I wanted to ask about was the “without ever restricting trail bytes to less than 80..BF”. I think that could be misinterpreted;

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-02 Thread Alastair Houghton via Unicode
On 1 Jun 2017, at 19:44, Asmus Freytag via Unicode wrote: > > What's not OK is to take an existing recommendation and change it to > something else, just to make bug reports go away for one implementations. > That's like two sleepers fighting over a blanket that's too

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Alastair Houghton via Unicode
On 31 May 2017, at 20:24, Shawn Steele via Unicode wrote: > > > For implementations that emit FFFD while handling text conversion and > > repair (ie, converting ill-formed > > UTF-8 to well-formed), it is best for interoperability if they get the same > > results, so that

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Alastair Houghton via Unicode
On 31 May 2017, at 20:42, Shawn Steele via Unicode wrote: > >> And *that* is what the specification says. The whole problem here is that >> someone elevated >> one choice to the status of “best practice”, and it’s a choice that some of >> us don’t think *should* >> be

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Alastair Houghton via Unicode
On 31 May 2017, at 18:43, Shawn Steele via Unicode wrote: > > It is unclear to me what the expected behavior would be for this corruption > if, for example, there were merely a half dozen 0x80 in the middle of ASCII > text? Is that garbage a single "character"? Perhaps

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-31 Thread Alastair Houghton via Unicode
> On 30 May 2017, at 18:11, Shawn Steele via Unicode > wrote: > >> Which is to completely reverse the current recommendation in Unicode 9.0. >> While I agree that this might help you fending off a bug report, it would >> create chances for bug reports for Ruby, Python3,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-06-01 Thread Alastair Houghton via Unicode
On 1 Jun 2017, at 10:32, Henri Sivonen via Unicode wrote: > > On Wed, May 31, 2017 at 10:42 PM, Shawn Steele via Unicode > wrote: >> * As far as I can tell, there are two (maybe three) sane approaches to this >> problem: >>* Either a "maximal"

Re: abstract characters, semantics, meaningful transformations ... Was: Tibetan Paluta

2017-05-01 Thread Alastair Houghton via Unicode
On 1 May 2017, at 15:19, Naena Guru via Unicode wrote: > > This whole attempt to make digitizing Indic script some esoteric, 'abstract', > 'semantic representation' and so on seems to me is an attempt to make Unicode > the realm of the some super humans. No. It’s

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Alastair Houghton via Unicode
On 5 Jun 2018, at 07:09, Martin J. Dürst via Unicode wrote: > > Hello Rebecca, > > On 2018/06/05 12:43, Rebecca T via Unicode wrote: > >> Something I’d love to see is translated keywords; shouldn’t be hard with a >> line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion >>

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Alastair Houghton via Unicode
On 4 Jun 2018, at 20:49, Manish Goregaokar via Unicode wrote: > > The Rust community is considering adding non-ascii identifiers, which follow > UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for > identifiers to be treated as equivalent under NFKC. > > Are there any

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Alastair Houghton via Unicode
On 7 Jun 2018, at 15:51, Frédéric Grosshans via Unicode wrote: > >> IMO the major issue with non-ASCII identifiers is not a technical one, but >> rather that it runs the risk of fragmenting the developer community. >> Everyone can *type* ASCII and everyone can read Latin characters (for >>

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Alastair Houghton via Unicode
On 6 Jun 2018, at 17:50, Manish Goregaokar wrote: > > I think the recommendation to use ASCII as much as possible is implicit there. It would be a very good idea to make it explicit. Even for English speakers, there may be a temptation to use characters that are hard to distinguish or hard to

Re: Keyboard layouts and CLDR (was: Re: 0027, 02BC, 2019, or a new character?)

2018-01-30 Thread Alastair Houghton via Unicode
On 30 Jan 2018, at 05:31, Marcel Schneider via Unicode wrote: > > OnMon, 29 Jan 2018 11:13:21 -0700, Tom Gewecke wrote: >> >>> On Jan 29, 2018, at 4:26 AM, Marcel Schneider via Unicode wrote: >>> >>> >>> the Windows US-Intl >>> does not allow to write French in a

Re: UNICODE vehicle vanity registration?

2018-02-14 Thread Alastair Houghton via Unicode
On 14 Feb 2018, at 16:29, Shriramana Sharma via Unicode wrote: > > Sorry but "UNICODE" does fit within those rules doesn't it? Yes. Stephane has misunderstood. (Shriramana meant the literal text “UNICODE”, which is indeed composed of letters A-Z and meets the definition

Re: Why so much emoji nonsense?

2018-02-14 Thread Alastair Houghton via Unicode
On 14 Feb 2018, at 13:25, Shriramana Sharma via Unicode wrote: > > From a mail which I had sent to two other Unicode contributors just a > few days ago: > > Frankly I agree that this whole emoji thing is a Pandora box. It > should have been restricted to emoticons to

Re: Translating the standard

2018-03-12 Thread Alastair Houghton via Unicode
On 11 Mar 2018, at 21:14, Marcel Schneider via Unicode wrote: > > Indeed, to be fair. And for implementers, documenting themselves in English > may scarcely ever have much of a problem, no matter whatʼs the locale. Agreed. Implementers will already understand English;