Hi all,
we still have 2 bundled extensions for working with strings in different encodings: ext/mbstring and ext/iconv. While working on bug #79200[1], I've noticed that the implementation of many of the iconv_*() functions is rather suboptimal. This is mostly because iconv() is meant just for character encoding *conversion*, but ext/iconv puts several other useful string functions on top of that, but can't have these really optimized, because the extension doesn't really know anything about those character encodings. For instance, iconv_strlen() is basically implemented by converting the input string to UCS-4, and then simply counting the UCS-4 characters. On the other hand, mb_strlen() makes use of length tables (where appropriate), and as such does not even need to convert the string in many typical cases. Some quick benchmarks on getting the string length of UTF-8 strings show that mb_strlen() is roughly 10 times faster than iconv_strlen(). Now it would be trivially possible to improve the iconv_strlen() implementation by converting a larger number of characters in one go (instead of currently up to two only[2]), which would make the function much faster (roughly 3 to 4 times for a 1024 character buffer), but still mb_strlen() would obviously beat that. The situation for the other iconv_*() functions is similar, more or less. However, it seems that iconv() can be much faster than mb_convert_encoding(). Quick benchmarks show a factor of 2 to 3. So I wonder if we wouldn't be better off if we unbundle ext/iconv, but move the iconv() function (and possibly the convert.iconv.* stream filter) into ext/standard. It shouldn't be hard to update code which uses any of the iconv_*() functions to use respective mb_*() functions, and users who couldn't do this, or don't want to for whatever reason, could still use the iconv package available from PECL. However, users who would switch to mbstring would likely get better performance for their applications. For core developers that would obviously save time to maintain both extensions. For users learning PHP, and also for new code, it would be beneficial to not have to decide which of these extensions to use; if they need character encoding conversion, iconv() would be preferable; for more general string functionality, it would be ext/mbstring. Thoughts? [1] <https://bugs.php.net/79200> [2] <https://github.com/php/php-src/blob/php-7.4.3/ext/iconv/iconv.c#L714> -- Christoph M. Becker -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php