Hi Behdad,

While complex-script shaping is obviously far more interesting, in practice there is a lot of very simple ASCII text on the web. So what would you think of adding a minor optimization that looks like it can give us about 10% gain on shaping ASCII text with simple fonts? The idea is to make hb_buffer_add check whether any non-ASCII characters have been put in the buffer; and if not, there's no need to run the normalization pass.

(Of course, there are plenty of non-ASCII characters that could also be present without normalization becoming relevant, but I didn't want to make the check any more expensive than a simple character-code comparison, and optimizing performance of ASCII-only runs will benefit a lot of real-world text for minimal effort.)

This was prompted by profile data such as http://people.mozilla.com/~bgirard/cleopatra/?report=c2e6bea3647461c0675e59441b78c0f5c409ac0d (see https://bugzilla.mozilla.org/show_bug.cgi?id=762710#c25), which relates to layout of a large, almost purely ASCII document. This shows the normalization pass - which we know is redundant for ASCII-only text - contributing around 10% of the total shaping time. With this patch, that time simply vanishes from the profile.

JK

diff --git a/src/hb-buffer-private.hh b/src/hb-buffer-private.hh
index 9864ca2..6378458 100644
--- a/src/hb-buffer-private.hh
+++ b/src/hb-buffer-private.hh
@@ -95,6 +95,8 @@ struct hb_buffer_t {
   bool in_error; /* Allocation failed */
   bool have_output; /* Whether we have an output buffer going on */
   bool have_positions; /* Whether we have positions */
+  bool have_non_ascii; /* Whether any non-ASCII characters are present;
+                          if not, we don't need to normalize */
 
   unsigned int idx; /* Cursor into ->info and ->pos arrays */
   unsigned int len; /* Length of ->info and ->pos arrays */
diff --git a/src/hb-buffer.cc b/src/hb-buffer.cc
index db4edce..1626e6b 100644
--- a/src/hb-buffer.cc
+++ b/src/hb-buffer.cc
@@ -152,6 +152,7 @@ hb_buffer_t::reset (void)
   in_error = false;
   have_output = false;
   have_positions = false;
+  have_non_ascii = false;
 
   idx = 0;
   len = 0;
@@ -179,6 +180,8 @@ hb_buffer_t::add (hb_codepoint_t  codepoint,
   glyph->mask = mask;
   glyph->cluster = cluster;
 
+  have_non_ascii |= codepoint > 0x7f;
+
   len++;
 }
 
@@ -557,7 +560,8 @@ hb_buffer_get_empty (void)
 
     true, /* in_error */
     true, /* have_output */
-    true  /* have_positions */
+    true, /* have_positions */
+    false /* have_non_ascii */
   };
 
   return const_cast<hb_buffer_t *> (&_hb_buffer_nil);
diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index d1e1d6c..945bd98 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -500,10 +500,11 @@ hb_ot_shape_internal (hb_ot_shape_context_t *c)
 
   hb_ensure_native_direction (c->buffer);
 
-  _hb_ot_shape_normalize (c->font, c->buffer,
-                         c->plan->shaper->normalization_preference ?
-                         c->plan->shaper->normalization_preference (c->plan) :
-                         HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
+  if (c->buffer->have_non_ascii)
+    _hb_ot_shape_normalize (c->font, c->buffer,
+                           c->plan->shaper->normalization_preference ?
+                           c->plan->shaper->normalization_preference (c->plan) 
:
+                           HB_OT_SHAPE_NORMALIZATION_MODE_DEFAULT);
 
   hb_ot_shape_setup_masks (c);
 
_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to