Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/23/2014 7:37 PM, Philippe Verdy wrote: Thanks for the clear reply, now I know that my example in a prior message would work appropriately with UBA: This is an [«] ARABIC EXAMPLE [»] for demonstration only. Because: - the opening guillemet is not stripped out of the context stack when

Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

2014-04-24 Thread Mathias Bynens
On 23 Apr 2014, at 22:16, Mathias Bynens math...@qiwi.be wrote: Let’s say I’m writing a program that strips combining characters and grapheme extenders from an input string. For combining marks, I’m looking for any non-combining marks (e.g. `a`) followed by one or more combining marks

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Eli Zaretskii
Date: Thu, 24 Apr 2014 00:28:50 -0700 From: Asmus Freytag asm...@ix.netcom.com CC: k...@unicode.org, Eli Zaretskii e...@gnu.org, James Clark j...@jclark.com, unicode Unicode Discussion unicode@unicode.org On 4/23/2014 7:37 PM, Philippe Verdy wrote: Thanks for the clear reply, now I

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Philippe Verdy
2014-04-24 16:39 GMT+02:00 Eli Zaretskii e...@gnu.org: In addition, assuming that by guillemets Philippe means U+00AB and U+00BB, guillemet is THE correct name, even in English. guillemot comes from an old typo error. If you don't want this term in Engmish you can still use double angle

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Eli Zaretskii
From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 24 Apr 2014 17:11:23 +0200 Cc: Asmus Freytag asm...@ix.netcom.com, Ilya Zakharevich nospam-ab...@ilyaz.org, k...@unicode.org, James Clark j...@jclark.com, unicode Unicode Discussion unicode@unicode.org In addition, assuming that

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/24/2014 8:20 AM, Eli Zaretskii wrote: So nothing (at least not the reason of the GC which is just an intermediate but incomplete helper) forbids the guillemets to be listed in BidiBrackets.txt. They don't satisfy the conditions for that. From BidiBrackets.txt: Philippe is incorrect once

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On 4/24/2014 7:39 AM, Eli Zaretskii wrote: This is _*incorrect*_, see the text in blue/bold in the definition copied below. The second bullet in item 3 of the second second-level bullet of the third top-level bullet of BD16 clearly says that all elements that are above the matched element are

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Philippe Verdy
2014-04-24 17:20 GMT+02:00 Eli Zaretskii e...@gnu.org: From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 24 Apr 2014 17:11:23 +0200 Cc: Asmus Freytag asm...@ix.netcom.com, Ilya Zakharevich nospam-ab...@ilyaz.org, k...@unicode.org, James Clark j...@jclark.com, unicode Unicode

Do 'Grapheme_Extend' characters only apply to 'Grapheme_Base'?

2014-04-24 Thread Doug Ewell
Mathias Bynens mathias at qiwi dot be wrote: Let's say I'm writing a program that strips combining characters and grapheme extenders from an input string. For combining marks, I'm looking for any non-combining marks (e.g. 'a') followed by one or more combining marks (e.g. ' ̃'), and then I

RE: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

2014-04-24 Thread Whistler, Ken
On 23 Apr 2014, at 22:16, Mathias Bynens math...@qiwi.be wrote: Let’s say I’m writing a program that strips combining characters and grapheme extenders from an input string. For combining marks, I’m looking for any non-combining marks (e.g. `a`) followed by one or more combining marks

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Doug Ewell
Re: Unclear text in the UBA (UAX#9) of Unicode 6.3 Philippe Verdy verdy underscore p at wanadoo dot fr wrote: [...] And at least your original message used and transliterations, not the actual characters. No I used the «» characters exacvtly like here. I absolutely never use the ASCII

Re: ID_Start, ID_Continue, and stability extensions

2014-04-24 Thread Steffen Nurpmeso
Markus Scherer markus@gmail.com wrote: |I strongly recommend you parse the derived properties rather than trying to |follow the derivation formula, because that can change over time. ..this file includes only those core properties that have themselves a derivation-may-change property? (I

Re: ID_Start, ID_Continue, and stability extensions

2014-04-24 Thread Markus Scherer
On Thu, Apr 24, 2014 at 12:56 PM, Steffen Nurpmeso sdao...@yandex.comwrote: Markus Scherer markus@gmail.com wrote: |I strongly recommend you parse the derived properties rather than trying to |follow the derivation formula, because that can change over time. ..this file includes only

Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

2014-04-24 Thread Mathias Bynens
On 24 Apr 2014, at 21:38, Whistler, Ken ken.whist...@sap.com wrote: Grapheme_Extend characters per se do not apply to anything. They are a mixture of different General_Category types -- mostly combining marks, but not all. The concept of applying to a base only refers to combining marks

Bidi Brackets for Dummies

2014-04-24 Thread Whistler, Ken
Given the incredible level of interest shown on this list during the last week, I am glad that I can finally announce the publication of Bidi Brackets for Dummies: http://www.unicode.org/notes/tr39/ I had wanted to publish that several weeks ago, but unfortunately, publication was held up for

Re: Bidi Brackets for Dummies

2014-04-24 Thread Markus Scherer
tn not tr http://www.unicode.org/notes/tn39/ markus ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode

Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

2014-04-24 Thread Richard Wordingham
On Thu, 24 Apr 2014 19:38:54 + Whistler, Ken ken.whist...@sap.com wrote: Yes. Grapheme_Extend characters per se do not apply to anything. They are a mixture of different General_Category types -- mostly combining marks, but not all. The concept of applying to a base only refers to

Re: Bidi Brackets for Dummies

2014-04-24 Thread Richard COOK
On Apr 24, 2014, at 2:16 PM, Whistler, Ken wrote: Given the incredible level of interest shown on this list during the last week, I am glad that I can finally announce the publication of Bidi Brackets for Dummies: http://www.unicode.org/notes/tn39/ Dear Dr. Ken, Thanks ever so much for

Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

2014-04-24 Thread Richard Wordingham
On Thu, 24 Apr 2014 23:07:58 +0200 Mathias Bynens math...@qiwi.be wrote: I realize reversing a string has nothing to do with text segmentation – but ignoring grapheme extenders leads to unexpected results (since after reversing the code points, the grapheme extender might extend the wrong

Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

2014-04-24 Thread Asmus Freytag
On this side show, Philippe finally is correct, because I received his message without ASCII-i-fication; he cc'd me directly, and I never saw the mangled text. It's a bit embarassing for a Unicode mail list to not even be able to let guillemets through unmolested. But this shall not distract