On 10/3/2015 12:28 PM, Asmus Freytag (t) wrote:
On 10/3/2015 8:15 AM, Sean Leonard wrote:
Thanks.
Well, "DIS 10646" is the Draft International Standard, particularly
Draft 1, from ~1990 or ~1991. (Sometimes it might have been called
10646.1.) Therefore it would likely only be in print form
In the absence of a specific tailoring, is the combination of a lone
surrogate and a combining mark a user-perceived character? Does a lone
surrogate constitute a user-perceived character?
The problem I have is that because of an application-specific bug,
when I attempt to enter the sequence
When I use http://unicode.org/cldr/utility/breaks.jsp, it does show the
sequence ᒏ�ᒺ as just two grapheme clusters.
In #29 we are specifically not concerned about ill-formed text (or other
degenerate cases). I suppose it would be possible to handle isolated
surrogates in different way (eg always
IMHO, isolate surrogates are not valid starters for combining sequences,
they must remain isolate : deleting this surrogate in your text editor
should not delete the following combining mark which is a separate cluster
(even if that cluster is defective before the deletion as it has NO base
I would not spend any time specifying intricate rules for unpaired
surrogates in 16-bit strings, or out-of range values in 32-bit strings.
Most processing will treat them like unassigned characters, like U+50005,
with only default behaviors.
markus
On 10/4/2015 6:02 AM, Richard
Wordingham wrote:
In the absence of a specific tailoring, is the combination of a lone
surrogate and a combining mark a user-perceived character? Does a lone
surrogate constitute a user-perceived character?
In an editing
On Sun, 4 Oct 2015 10:50:43 -0700
Markus Scherer wrote:
> I would not spend any time specifying intricate rules for unpaired
> surrogates in 16-bit strings, or out-of range values in 32-bit
> strings. Most processing will treat them like unassigned characters,
> like
On 10/4/2015 12:38 PM, Richard
Wordingham wrote:
On Sun, 4 Oct 2015 10:50:43 -0700
Markus Scherer wrote:
I would not spend any time specifying intricate rules for unpaired
surrogates in 16-bit strings, or out-of range values
On Sun, 4 Oct 2015 21:48:12 +0200
Philippe Verdy wrote:
> 2015-10-04 21:30 GMT+02:00 Richard Wordingham <
> richard.wording...@ntlworld.com>:
> > On Sun, 4 Oct 2015 15:44:32 +0200
> > Mark Davis ☕️ wrote:
> > > When I use
On 10/4/2015 5:30 AM, Sean Leonard
wrote:
On
10/3/2015 12:28 PM, Asmus Freytag (t) wrote:
On 10/3/2015 8:15 AM, Sean Leonard wrote:
Thanks.
Well, "DIS 10646" is the Draft International Standard,
On Sun, 4 Oct 2015 15:44:32 +0200
Mark Davis ☕️ wrote:
> When I use http://unicode.org/cldr/utility/breaks.jsp, it does show
> the sequence ᒏ�ᒺ as just two grapheme clusters.
But that's the sequence , which has no lone
surrogates at all! (I had to
2015-10-04 21:30 GMT+02:00 Richard Wordingham <
richard.wording...@ntlworld.com>:
> On Sun, 4 Oct 2015 15:44:32 +0200
> Mark Davis ☕️ wrote:
>
> > When I use http://unicode.org/cldr/utility/breaks.jsp, it does show
> > the sequence ᒏ�ᒺ as just two grapheme clusters.
>
> But
On Sun, 4 Oct 2015 12:30:23 -0700
"Asmus Freytag (t)" wrote:
> If you have a bug that doesn't let you enter a sequence without
> creating a lone surrogate followed by a combining mark, that's a
> bug...
Unfortunately, the bug appears to be in an ill-defined interface in
The default behavior of unassigned characters are to treat them like base
characters, so if they are followed by a combining mark, it would create a
default grapheme cluster, which is not appropriate here.
Surrogates are not chracters (so they cannot have any character
properties), but they are
On 10/4/2015 2:35 PM, Richard
Wordingham wrote:
However my opinion is that ᒏ�ᒺ (using U+FFFD substitution) gives 2
> grapheme clusters, I would prefer a solution that gives 3 grapheme
> clusters, as if the lone surrogate was a line-break control, so that
On 10/4/2015 4:14 PM, Richard
Wordingham wrote:
respect to what to erase or undo.
For sequences that belong to a given language, you can pick the
behavior that makes most sense in them, but for lone surrogates, by
definition you are dealing
On Sun, 4 Oct 2015 16:57:15 -0700
"Asmus Freytag (t)" wrote:
> On 10/4/2015 4:14 PM, Richard Wordingham wrote:
> respect to what to erase or undo.
>>> For sequences that belong to a given language, you can pick the
>>> behavior that makes most sense in them, but for
On Sun, 4 Oct 2015 15:34:13 -0700
"Asmus Freytag (t)" wrote:
> On 10/4/2015 2:35 PM, Richard Wordingham wrote:
>> I'd much prefer to be able to delete the first character of a
>> grapheme
>> cluster. It's annoying to have to retype 4 characters because one's
>>
On Fri, 2 Oct 2015 09:25:01 +0200
Mark Davis ☕️ wrote:
> We add:
>
> WB13c Mongolian_Letter × NNBSP
> WB13d NNBSP × Mongolian_Letter
>
> *If* we want to also change behavior on the other side of the NNBSP,
> whenever the Mongolian_Letter and NNBSP occur in sequence, we add
On Sun, 4 Oct 2015 14:29:16 -0700
"Asmus Freytag (t)" wrote:
> On 10/4/2015 12:38 PM, Richard Wordingham wrote:
> The problem you are trying to solve is to allow editing on
> the code point level, or, if you will, the keystroke level.
> Generally, there will be a sweet
20 matches
Mail list logo