Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Philippe Verdy
Note also this statement at the begining of the specification: Single boundaries. Each rule has exactly one boundary position. This restriction is more a limitation on the specification methods, because a rule with multiple boundaries could be expressed instead as multiple rules. For example: *

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Philippe Verdy
IMHO, the ZWJ should glue with the last symbol following your examples. But the combining diaeresis following the ZWJ extends it (even if in my opinion it is "defective" and would likely display on a dotted ciurcle in renderers, but not defective for the string definition of combining sequences).

Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Tom Hacohen
Dear, I recently updated libunibreak[1] according to unicode 9.0.0. I thought I implemented it correctly, however it fails against two of the tests in the reference test data: ÷ 200D × 0308 ÷ 2764 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0]

Re: Bidi: inserting Japanese paragraphs in Arabic/Farsi document

2016-11-22 Thread Asmus Freytag (c)
On 11/21/2016 5:47 PM, Philippe Verdy wrote: Look at where the Asian quotes are partially "moved" by the ASCII quotes in Chrome. How does Chrome enter into this? (What I posted is a screenshot from Thunderbird on Windows 7). It seems to fully match up the the example using the UPPER/lower