Hi!
Here are some statistics: - average haystack length: 624.2 - average needle length: 1.9 ! -> 63% of needles of length 1 - avg length of haystacks shorter than avg: 41.0 -> 85% of all haystacks - avg length of haystacks longer than avg: 5685.11
I think it would be interesting to see same excluding 1-char needles since in this case it should do one-char lookup (btw, if we don't do it on C level, it might be a good idea to).
Although strpos implements fix for that, some other functions don't. My idea is than to implement ZEND_MEMNSTR once again in shape: if (needle_len = 1) here just linear sweep else if haystack_len < 5000 (5000 is arbitrary - maybe some more tests needed to choose good value) original implementation (as it is the best one in this case) else BM/KMP (i think BM will be better in this case, as some people suggested)
I'm not sure very big haystacks really worth the trouble - how many of them are used? It may be interesting to see medians instead of averages for that. But len=1 I think worth having special case.
-- Stanislav Malyshev, Zend Software Architect [EMAIL PROTECTED] http://www.zend.com/ (408)253-8829 MSN: [EMAIL PROTECTED] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php