Re: Acquiring DIS 10646

2015-10-04 Thread Sean Leonard
On 10/3/2015 12:28 PM, Asmus Freytag (t) wrote: On 10/3/2015 8:15 AM, Sean Leonard wrote: Thanks. Well, "DIS 10646" is the Draft International Standard, particularly Draft 1, from ~1990 or ~1991. (Sometimes it might have been called 10646.1.) Therefore it would likely only be in print form

Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
In the absence of a specific tailoring, is the combination of a lone surrogate and a combining mark a user-perceived character? Does a lone surrogate constitute a user-perceived character? The problem I have is that because of an application-specific bug, when I attempt to enter the sequence

Re: Deleting Lone Surrogates

2015-10-04 Thread Mark Davis ☕️
When I use http://unicode.org/cldr/utility/breaks.jsp, it does show the sequence ᒏ�ᒺ as just two grapheme clusters. In #29 we are specifically not concerned about ill-formed text (or other degenerate cases). I suppose it would be possible to handle isolated surrogates in different way (eg always

Re: Deleting Lone Surrogates

2015-10-04 Thread Philippe Verdy
IMHO, isolate surrogates are not valid starters for combining sequences, they must remain isolate : deleting this surrogate in your text editor should not delete the following combining mark which is a separate cluster (even if that cluster is defective before the deletion as it has NO base

Re: Deleting Lone Surrogates

2015-10-04 Thread Markus Scherer
I would not spend any time specifying intricate rules for unpaired surrogates in 16-bit strings, or out-of range values in 32-bit strings. Most processing will treat them like unassigned characters, like U+50005, with only default behaviors. markus

Re: Deleting Lone Surrogates

2015-10-04 Thread Asmus Freytag (t)
On 10/4/2015 6:02 AM, Richard Wordingham wrote: In the absence of a specific tailoring, is the combination of a lone surrogate and a combining mark a user-perceived character? Does a lone surrogate constitute a user-perceived character? In an editing

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 10:50:43 -0700 Markus Scherer wrote: > I would not spend any time specifying intricate rules for unpaired > surrogates in 16-bit strings, or out-of range values in 32-bit > strings. Most processing will treat them like unassigned characters, > like

Re: Deleting Lone Surrogates

2015-10-04 Thread Asmus Freytag (t)
On 10/4/2015 12:38 PM, Richard Wordingham wrote: On Sun, 4 Oct 2015 10:50:43 -0700 Markus Scherer wrote: I would not spend any time specifying intricate rules for unpaired surrogates in 16-bit strings, or out-of range values

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 21:48:12 +0200 Philippe Verdy wrote: > 2015-10-04 21:30 GMT+02:00 Richard Wordingham < > richard.wording...@ntlworld.com>: > > On Sun, 4 Oct 2015 15:44:32 +0200 > > Mark Davis ☕️ wrote: > > > When I use

Re: Acquiring DIS 10646

2015-10-04 Thread Asmus Freytag (t)
On 10/4/2015 5:30 AM, Sean Leonard wrote: On 10/3/2015 12:28 PM, Asmus Freytag (t) wrote: On 10/3/2015 8:15 AM, Sean Leonard wrote: Thanks. Well, "DIS 10646" is the Draft International Standard,

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 15:44:32 +0200 Mark Davis ☕️ wrote: > When I use http://unicode.org/cldr/utility/breaks.jsp, it does show > the sequence ᒏ�ᒺ as just two grapheme clusters. But that's the sequence , which has no lone surrogates at all! (I had to

Re: Deleting Lone Surrogates

2015-10-04 Thread Philippe Verdy
2015-10-04 21:30 GMT+02:00 Richard Wordingham < richard.wording...@ntlworld.com>: > On Sun, 4 Oct 2015 15:44:32 +0200 > Mark Davis ☕️ wrote: > > > When I use http://unicode.org/cldr/utility/breaks.jsp, it does show > > the sequence ᒏ�ᒺ as just two grapheme clusters. > > But

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 12:30:23 -0700 "Asmus Freytag (t)" wrote: > If you have a bug that doesn't let you enter a sequence without > creating a lone surrogate followed by a combining mark, that's a > bug... Unfortunately, the bug appears to be in an ill-defined interface in

Re: Deleting Lone Surrogates

2015-10-04 Thread Philippe Verdy
The default behavior of unassigned characters are to treat them like base characters, so if they are followed by a combining mark, it would create a default grapheme cluster, which is not appropriate here. Surrogates are not chracters (so they cannot have any character properties), but they are

Re: Deleting Lone Surrogates

2015-10-04 Thread Asmus Freytag (t)
On 10/4/2015 2:35 PM, Richard Wordingham wrote: However my opinion is that ᒏ�ᒺ (using U+FFFD substitution) gives 2 > grapheme clusters, I would prefer a solution that gives 3 grapheme > clusters, as if the lone surrogate was a line-break control, so that

Re: Deleting Lone Surrogates

2015-10-04 Thread Asmus Freytag (t)
On 10/4/2015 4:14 PM, Richard Wordingham wrote: respect to what to erase or undo. For sequences that belong to a given language, you can pick the behavior that makes most sense in them, but for lone surrogates, by definition you are dealing

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 16:57:15 -0700 "Asmus Freytag (t)" wrote: > On 10/4/2015 4:14 PM, Richard Wordingham wrote: > respect to what to erase or undo. >>> For sequences that belong to a given language, you can pick the >>> behavior that makes most sense in them, but for

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 15:34:13 -0700 "Asmus Freytag (t)" wrote: > On 10/4/2015 2:35 PM, Richard Wordingham wrote: >> I'd much prefer to be able to delete the first character of a >> grapheme >> cluster. It's annoying to have to retype 4 characters because one's >>

Re: NNBSP and Word Boundaries

2015-10-04 Thread Richard Wordingham
On Fri, 2 Oct 2015 09:25:01 +0200 Mark Davis ☕️ wrote: > We add: > > WB13c Mongolian_Letter × NNBSP > WB13d NNBSP × Mongolian_Letter > > *If* we want to also change behavior on the other side of the NNBSP, > whenever the Mongolian_Letter and NNBSP occur in sequence, we add

Re: Deleting Lone Surrogates

2015-10-04 Thread Richard Wordingham
On Sun, 4 Oct 2015 14:29:16 -0700 "Asmus Freytag (t)" wrote: > On 10/4/2015 12:38 PM, Richard Wordingham wrote: > The problem you are trying to solve is to allow editing on > the code point level, or, if you will, the keystroke level. > Generally, there will be a sweet