Hi, hb_icu_unicode_decompose() uses ICU's u_strlen() to get the number of Unicode codepoints in normalized buffer. However, it seems that it returns the number of UChars in the buffer. UChar is equivalent to uint16_t. This means that we can't get right number of codepoints when the buffer contains surrogate pairs. This eventually causes infinite loop during decomposition. For example, if the function is called like:
hb_codepoint_t a, b; hb_icu_unicode_decompose(0/*unused*/, 0x1f1ef /* REGIONAL INDICATOR SYMBOL LETTER J */, &a, &b, 0/*unused*/); then, it returns TRUE with *a == 0x1f1ef. This leads infinite loop in decompose(). Attached patch would fix the problem. Thanks,
From 8a5e8604a76efc8032d69e9f2356867d5bd78a6a Mon Sep 17 00:00:00 2001 From: Kenichi Ishibashi <[email protected]> Date: Fri, 28 Oct 2011 15:28:19 +0900 Subject: [PATCH] Count codepoints instead of calling u_strlen. --- src/hb-icu.cc | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/src/hb-icu.cc b/src/hb-icu.cc index 0f5ed1c..71cba35 100644 --- a/src/hb-icu.cc +++ b/src/hb-icu.cc @@ -228,7 +228,12 @@ hb_icu_unicode_decompose (hb_unicode_funcs_t *ufuncs HB_UNUSED, return FALSE; normalized[len] = 0; - len = u_strlen (normalized); + len = 0; + int pos = 0; + while (normalized[pos]) { + U16_FWD_1_UNSAFE(normalized, pos); + ++len; + } if (len == 1) { U16_GET_UNSAFE (normalized, 0, *a); -- 1.7.3.1
_______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
