Dear,

I recently updated libunibreak[1] according to unicode 9.0.0. I thought I implemented it correctly, however it fails against two of the tests in the reference test data:

÷ 200D × 0308 ÷ 2764 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] HEAVY BLACK HEART (Glue_After_Zwj) ÷ [0.3]

and

÷ 200D × 0308 ÷ 1F466 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] BOY (EBG) ÷ [0.3]


More specifically, it fails in both after the "combining diaeresis". My implementation marks it as a break, whereas the test data as not. The reference implementation, as expected, agrees with the test data.


However, looking at the test case and the UAX[2], this does not look correct. More specifically, because of rule 4:
ZWJ Extended GAZ -> ZWJ GAZ
And then according to rule 3c, there should be no break opportunity between them. The reference implementation, however, uses rule 999 here, which I believe is incorrect.


Am I missing anything, or is this an issue with the reference test data and reference implementation?

Thanks,
Tom.

[1]: https://github.com/adah1972/libunibreak
[2]: http://www.unicode.org/reports/tr29/#WB1

Reply via email to