Hi all, I rewrote is_numeric_string/unicode to be faster and change a couple things. The changes being: 1) Previously, large numbers (very long or "1e500") that became INF were ignored (Bug #26349), which is not the behavior anywhere else. 2) Leading whitespace with hex numbers or ones that started with . (" .123") also caused them to be ignored. 3) Hex strings were limited to LONG_MAX, and in scripts/parser, ULONG_MAX. I added a zend_hex_strtod() function to handle numbers > LONG_MAX in both places. From the previous comments like "strtod() messes up hex numbers," it seems there was desire to support them. :-) 4) Small change, but the string "0x" was considered non-numeric before, but a partial match of the 0 now (basically to get a more accurate error level/message with zend_parse_parameters(), for example).
Now the performance... The errno stuff has been removed from is_numeric_* (and optimized in the parser) to save function calls with thread-safe libraries (are they used even when ZTS is disabled?). In my tests on Windows, I saw a 5-15% improvement with longs (less with more digits; on 64-bit systems, it could be slower at 12-15+ digits, but they're not common). (With HEAD, everything I checked was consistent, but in 5.2, a few random long tests were slower; must be some compiler weirdness? :-/) So not much difference there for these changes, BUT doubles are over *twice* as fast, and non-numeric string comparisons are up to nearly 3 times faster! (Slightly less % improvement in Unicode mode.) Yeah, non-numeric strings are detected very fast, which may be more significant since is_numeric_* is always used on them (from compare_function(), zendi_smart_strcmp(), etc.). Also, no number conversion is done if there's no corresponding pointer to fill -- much faster when code is "just checking." The larger inline function did increase the binary size by a few K... The patches: http://realplain.com/php/is_numeric.diff http://realplain.com/php/is_numeric_5_2.diff You can see that I changed MAX_LENGTH_OF_LONG to be accurate on 32-/64-bit, which my changes rely on. I also fixed a few places where memory calculations that use it could be too small, in theory. I wanted to get this in before Ilia's Thursday deadline (if it's still on :-)), in case it can be applied soon. Finally, don't know if you'd want to use it as is, but I've attached possible NEWS file updates about this stuff. Thoughts, questions? Thanks. Matt
Index: NEWS =================================================================== RCS file: /repository/php-src/NEWS,v retrieving revision 1.2027.2.547.2.426 diff -u -r1.2027.2.547.2.426 NEWS --- NEWS 12 Dec 2006 07:38:04 -0000 1.2027.2.547.2.426 +++ NEWS 12 Dec 2006 13:29:19 -0000 @@ -5,12 +5,15 @@ the page. (Ilia) - Added new function, sys_get_temp_dir(). (Hartmut) - Added missing object support to file_put_contents(). (Ilia) +- Added support for hex numbers of any size. (Matt Wilmas) - Changed double-to-string utilities to use BSD implementation. (Dmitry, Tony) - Updated bundled libcURL to version 7.16.0 in the Windows distro. (Edin) - Updated timezone database to version 2006.16. (Derick) - cgi.* and fastcgi.* directives are moved to INI subsystem. The new directive cgi.check_shebang_line can be used to ommiting checnk for "#! /usr/bin/php" line. (Dmitry). +- Improved performance of numeric string detection and non-identical comparison + of strings. (Matt Wilmas) - Windows related optimizations (Dmitry, Stas) . COM initialization/deinitialization are done only if necessary . removed unnecessary checks for ISREG file and corresponding stat() calls @@ -182,6 +185,8 @@ (Ilia,Dmitry, Matt Wilmas) - Fixed bug #29840 (is_executable() does not honor safe_mode_exec_dir setting). (Ilia) +- Fixed bug #26349 (is_numeric() returns false for strings with more than 308 + digits). (Matt Wilmas) 02 Nov 2006, PHP 5.2.0 - Updated bundled OpenSSL to version 0.9.8d in the Windows distro. (Edin)
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php