Hi!

Here are some statistics:
- average haystack length: 624.2
- average needle length: 1.9 ! -> 63% of needles of length 1
- avg length of haystacks shorter than avg: 41.0 -> 85% of all haystacks
- avg length of haystacks  longer than avg: 5685.11

I think it would be interesting to see same excluding 1-char needles since in this case it should do one-char lookup (btw, if we don't do it on C level, it might be a good idea to).


Although strpos implements fix for that, some other functions don't. My idea
is than to implement ZEND_MEMNSTR once again in shape:

if (needle_len = 1)
    here just linear sweep
else if haystack_len < 5000 (5000 is arbitrary - maybe some more tests
needed to choose good value)
    original implementation (as it is the best one in this case)
else
    BM/KMP (i think BM will be better in this case, as some people
suggested)

I'm not sure very big haystacks really worth the trouble - how many of them are used? It may be interesting to see medians instead of averages for that. But len=1 I think worth having special case.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to