Final note: Please file separate bugs for any individual optimization you think is worth performing (or is an obvious improvement).
Thanks, behdad On 03/26/2010 01:25 PM, Behdad Esfahbod wrote: > Sorry for replying so late. I saw a few replies implying that the developer > time to implement a (to me, unmeasurably) useful feature has been spent > already so I should go ahead and commit it. There are various flaws with that > argument: > > - It ignores the fact that writing a patch is a small part of the time spent > on a change. Ignoring the maintainer review time as well as future > maintenance. If you think I should commit without spending significant time > on it, well, there's a reason you're not the maintainer :P. In short, it's > the maintainer that is taking the risk, not you or the patch author. Guess > why I'm replying this late? Because reading 18 messages and 20 patches takes > time. Time I could spend on fixing a bug that has a measurable impact at > least. > > - It also assumes that the patch is ready, and useful. The original patch > series had various flaws. A few I list: > > * Introduce 256 new relocations! > > * Inlined a public function, but just to make an indirect function call > instead. What's the point of inlining then?! > > * Had unknown impacts on systems with higher function call overhead. > > * Was not tested in real-life situations. Perf tests are not realistic. > Calling g_utf8_next_char a million times in a loop is nothing like real-life. > In real life strings that are processed are really short. Memory cache > effects make any micro-optimization you make look like noise. > > * Changed the semantics of the glib UTF-8 functions. Dealing with UTF-8 > coming from outside world is very sensitive matter security-wise. There's > backward compatibility also. Can't just decide to return a different value > from now on. > > * The construct borrowed from glibmm, as beautiful as it is, is WRONG for > 6-byte-long UTF-8. It just doesn't work. We historically support those > sequences. > > > That said. I'm not being unfair to anyone here. I personally am a utf-8 > microoptimizing geek myself. See for example this blogpost: > > http://mces.blogspot.com/2008/04/utf-8-bit-manipulation.html > > So I'm not even willing to commit my own optimization to that code without > seeing real-world numbers first. > > Another idea, now that people are measuring: What about this: > > static const int utf8_mask_data[7] = { > 0, 0x7f, 0x1f, 0x0f, 0x07, 0x03, 0x01 > }; > > #define UTF8_COMPUTE(Char, Mask, Len) \ > G_STMT_BEGIN { \ > Len = utf8_skip_data[(guchar)(Char)]; \ > Mask = utf8_mask_data[Len]; \ > if (G_UNLIKELY ((guchar)(Char) >= 0xfe)) \ > Len = -1; \ > } G_STMT_END > > > > behdad _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list