Hi,

hb_icu_unicode_decompose() uses ICU's u_strlen() to get the number of
Unicode codepoints in normalized buffer. However, it seems that it returns
the number of UChars in the buffer. UChar is equivalent to uint16_t. This
means that we can't get right number of codepoints when the buffer contains
surrogate pairs. This eventually causes infinite loop during decomposition.
For example, if the function is called like:

  hb_codepoint_t a, b;
  hb_icu_unicode_decompose(0/*unused*/, 0x1f1ef /* REGIONAL INDICATOR SYMBOL
LETTER J */, &a, &b, 0/*unused*/);

then, it returns TRUE with *a == 0x1f1ef. This leads infinite loop in
decompose(). Attached patch would fix the problem.

Thanks,
From 8a5e8604a76efc8032d69e9f2356867d5bd78a6a Mon Sep 17 00:00:00 2001
From: Kenichi Ishibashi <[email protected]>
Date: Fri, 28 Oct 2011 15:28:19 +0900
Subject: [PATCH] Count codepoints instead of calling u_strlen.

---
 src/hb-icu.cc |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/src/hb-icu.cc b/src/hb-icu.cc
index 0f5ed1c..71cba35 100644
--- a/src/hb-icu.cc
+++ b/src/hb-icu.cc
@@ -228,7 +228,12 @@ hb_icu_unicode_decompose (hb_unicode_funcs_t *ufuncs HB_UNUSED,
     return FALSE;
 
   normalized[len] = 0;
-  len = u_strlen (normalized);
+  len = 0;
+  int pos = 0;
+  while (normalized[pos]) {
+    U16_FWD_1_UNSAFE(normalized, pos);
+    ++len;
+  }
 
   if (len == 1) {
     U16_GET_UNSAFE (normalized, 0, *a);
-- 
1.7.3.1

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to