Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-11-22 Thread Henri Sivonen via Unicode
On Wed, Jun 13, 2018 at 2:49 PM Mark Davis ☕️ wrote: > > > That is, why is conforming to UAX #31 worth the risk of prohibiting the use > > of characters that some users might want to use? > > One could parse for certain sequences, putting characters into a number of > broad categories. Very

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-13 Thread Mark Davis ☕️ via Unicode
> That is, why is conforming to UAX #31 worth the risk of prohibiting the use of characters that some users might want to use? One could parse for certain sequences, putting characters into a number of broad categories. Very approximately: - junk ~= [[:cn:][:cs:][:co:]]+ - whitespace ~=

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-08 Thread Hans Åberg via Unicode
> On 8 Jun 2018, at 11:07, Henri Sivonen via Unicode > wrote: > > My question is: > > When designing a syntax where tokens with the user-chosen characters > can't occur next to each other without some syntax-reserved characters > between them, what advantages are there from limiting the

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-08 Thread Henri Sivonen via Unicode
On Wed, Jun 6, 2018 at 2:55 PM, Henri Sivonen wrote: > Considering that ruling out too much can be a problem later, but just > treating anything above ASCII as opaque hasn't caused trouble (that I > know of) for HTML other than compatibility issues with XML's stricter > stance, why should a

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Frédéric Grosshans via Unicode
Le 07/06/2018 à 18:01, Alastair Houghton a écrit : I appreciate that the upshot of the Anglicised world of software engineering is that native English speakers have an advantage, and those for whom Latin isn’t

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Asmus Freytag via Unicode
On 6/7/2018 9:01 AM, Alastair Houghton via Unicode wrote: But please don’t misunderstand; I am not — and have not been — arguing against non-ASCII identifiers. We were asked whether there were any problems. These are problems (or perhaps we might

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Alastair Houghton via Unicode
On 7 Jun 2018, at 15:51, Frédéric Grosshans via Unicode wrote: > >> IMO the major issue with non-ASCII identifiers is not a technical one, but >> rather that it runs the risk of fragmenting the developer community. >> Everyone can *type* ASCII and everyone can read Latin characters (for >>

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Frédéric Grosshans via Unicode
Le 06/06/2018 à 11:29, Alastair Houghton via Unicode a écrit : On 4 Jun 2018, at 20:49, Manish Goregaokar via Unicode wrote: The Rust community is considering adding non-ascii identifiers, which follow UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for identifiers to

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Mark Davis ☕️ via Unicode
Got it, thanks. Mark On Thu, Jun 7, 2018 at 3:29 PM, Richard Wordingham via Unicode < unicode@unicode.org> wrote: > On Thu, 7 Jun 2018 10:42:46 +0200 > Mark Davis ☕️ via Unicode wrote: > > > > The proposal also asks for identifiers to be treated as equivalent > > > under > > NFKC. > > > > The

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Thu, 7 Jun 2018 10:42:46 +0200 Mark Davis ☕️ via Unicode wrote: > > The proposal also asks for identifiers to be treated as equivalent > > under > NFKC. > > The guidance in #31 may not be clear. It is not to replace > identifiers as typed in by the user by their NFKC equivalent. It is >

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Thu, 7 Jun 2018 13:32:13 +0200 Joan Montané via Unicode wrote: > 2018-06-04 21:49 GMT+02:00 Manish Goregaokar via Unicode < > unicode@unicode.org>: > * Ŀ, LATIN CAPITAL LETTER L WITH MIDDEL DOT NFKC decomposes > to LATIN CAPITAL LETTER L (U+004C) MIDDLE DOT (U+00B7): > * ŀ, LATIN SMALL

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Philippe Verdy via Unicode
If you intend to allow all the standard orthography of common languages, you would also need to support apostrophes and regular hyphens in identifiers, including those from ASCII ! The Catalan middle dot is just a compact variant of the hyphen, it should have better been a diacritic, but the

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Joan Montané via Unicode
2018-06-04 21:49 GMT+02:00 Manish Goregaokar via Unicode < unicode@unicode.org>: > Hi, > > The Rust community is considering > adding non-ascii > identifiers, which follow UAX #31 > (XID_Start XID_Continue*, with

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Hans Åberg via Unicode
Now that the distinction is possible, it is recommended to do that. My original question was directed to the OP, whether it is deliberate. And they are confusables only to those not accustomed to it. > On 7 Jun 2018, at 12:05, Philippe Verdy wrote: > > In my opinion the usual constant is

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Philippe Verdy via Unicode
In my opinion the usual constant is most often shown as "휋" (curly serifs, slightly slanted) in mathematical articles and books (and in TeX), but rarely as "π" (sans-serif). There's a tradition of using handwriting for this symbol on backboards (not always with serifs, but still often slanted).

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Mark Davis ☕️ via Unicode
> The proposal also asks for identifiers to be treated as equivalent under NFKC. The guidance in #31 may not be clear. It is not to replace identifiers as typed in by the user by their NFKC equivalent. It is rather to internally *identify* two identifiers (as typed in by the user) as being the

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Hans Åberg via Unicode
> On 7 Jun 2018, at 03:56, Asmus Freytag via Unicode > wrote: > > On 6/6/2018 2:25 PM, Hans Åberg via Unicode wrote: >>> On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode >>> wrote: >>> >>> The Rust community is considering adding non-ascii identifiers, which >>> follow UAX #31

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Alastair Houghton via Unicode
On 6 Jun 2018, at 17:50, Manish Goregaokar wrote: > > I think the recommendation to use ASCII as much as possible is implicit there. It would be a very good idea to make it explicit. Even for English speakers, there may be a temptation to use characters that are hard to distinguish or hard to

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Richard Wordingham via Unicode
On Tue, 5 Jun 2018 01:37:47 +0100 Richard Wordingham via Unicode wrote: > The decomposed > form that looks the same is นํ้า . > The problem is that for sane results, needs > special handling. This sequence is also often untypable - part of the > protection against Thai homographs. I've been

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Richard Wordingham via Unicode
On Mon, 4 Jun 2018 12:49:20 -0700 Manish Goregaokar via Unicode wrote: > Hi, > > The Rust community is considering > adding non-ascii > identifiers, which follow UAX #31 > (XID_Start XID_Continue*, with >

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Asmus Freytag via Unicode
On 6/6/2018 2:25 PM, Hans Åberg via Unicode wrote: On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode wrote: The Rust community is considering adding non-ascii identifiers, which follow UAX #31 (XID_Start XID_Continue*, with tweaks). The

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Hans Åberg via Unicode
> On 4 Jun 2018, at 21:49, Manish Goregaokar via Unicode > wrote: > > The Rust community is considering adding non-ascii identifiers, which follow > UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for > identifiers to be treated as equivalent under NFKC. So, in this

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Henri Sivonen via Unicode
On Mon, Jun 4, 2018 at 10:49 PM, Manish Goregaokar via Unicode wrote: > The Rust community is considering adding non-ascii identifiers, which follow > UAX #31 (XID_Start XID_Continue*, with tweaks). UAX #31 is rather light on documenting its rationale. I realize that XML is a different case

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Philippe Verdy via Unicode
It could be argued that "modern" languages could use unique identifiers for their syntax or API independantly of the name being rendered. The problem is that translated names may collide in non-obvious way and become ambiguous. We've already seen the problems it caused in Excel with its translated

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Alastair Houghton via Unicode
On 5 Jun 2018, at 07:09, Martin J. Dürst via Unicode wrote: > > Hello Rebecca, > > On 2018/06/05 12:43, Rebecca T via Unicode wrote: > >> Something I’d love to see is translated keywords; shouldn’t be hard with a >> line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion >>

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-06 Thread Alastair Houghton via Unicode
On 4 Jun 2018, at 20:49, Manish Goregaokar via Unicode wrote: > > The Rust community is considering adding non-ascii identifiers, which follow > UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for > identifiers to be treated as equivalent under NFKC. > > Are there any

Requiring typed text to be NFKC (was: Can NFKC turn valid UAX 31 identifiers into non-identifiers?)

2018-06-05 Thread Manish Goregaokar via Unicode
Following up from my previous email , one of the ideas that was brought up was that if we're going to consider NFKC forms equivalent, we should require things to be typed in NFKC. I'm a bit wary of this. As Richard brought up in

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-05 Thread Martin J. Dürst via Unicode
Hello Rebecca, On 2018/06/05 12:43, Rebecca T via Unicode wrote: Something I’d love to see is translated keywords; shouldn’t be hard with a line in the cargo.toml for a ruidmentary lookup. Again, I’m of the opinion that an imperfect implementation is better than no attempt. I remember reading

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Rebecca T via Unicode
I think that the benefits of inclusion from allowing non-ASCII identifiers far outweigh any corner cases this might cause. (Although ironing out and analyzing those is of course important, I don’t think they should be obstacles for implementing this kind of thing.) Something I’d love to see is

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Richard Wordingham via Unicode
On Mon, 4 Jun 2018 12:49:20 -0700 Manish Goregaokar via Unicode wrote: > Hi, > > The Rust community is considering > adding non-ascii > identifiers, which follow UAX #31 > (XID_Start XID_Continue*, with >

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Manish Goregaokar via Unicode
Oh, looks like UAX 31 has info on how to be closed under NFC http://www.unicode.org/reports/tr31/#NFKC_Modifications -Manish On Mon, Jun 4, 2018 at 12:49 PM Manish Goregaokar wrote: > Hi, > > The Rust community is considering > adding non-ascii >

Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Manish Goregaokar via Unicode
Hi, The Rust community is considering adding non-ascii identifiers, which follow UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for identifiers to be treated as equivalent under NFKC. Are