Hello,
Here goes first diff - to keep it simple and avoid confusion I will
post them one-by-one after the previous is accepted/rejected.
Optimization of STRRPOS/STRRIPOS for PHP_5_3.
2 things are changed:
* implementation of search loop from theta(NM) to o(N), omega(NM) -
it is now the same as original zend_memnstr - using zend_memrchr
instead of memchr function. Here is comparison of performance: http://
212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=48:strrpos-patch-
proposal&catid=36:patches&Itemid=56
This implementation is faster than original for any reasonable input
(a little slower for "aaab" in "aaaaaaaab", but that is really bad
case), and I think in this case we should not bother with KMP/BM as I
haven't found any call to this functions with haystack longer than
100 (NOTE: on MY server, over ~1h - this can be different for
wikipedia guys;))). In case somebody needs KMP - I have it
implemented to work "backwards" thus suitable for strrpos.
* second thing is pre-lowering the haystack in strripos in case of
needle len=1 - why is explained here: http://212.85.117.53/gsoc/
index.php?option=com_content&view=article&id=49:strripos-possible-
patches&catid=36:patches&Itemid=56
Here is the diff:
--------------------------------------------------
diff -u -d -r1.445.2.14.2.69.2.22 string.c
--- ext/standard/string.c 27 May 2008 10:29:33 -0000
1.445.2.14.2.69.2.22
+++ ext/standard/string.c 20 Jun 2008 08:08:34 -0000
@@ -1876,18 +1876,19 @@
if (needle_len == 1) {
/* Single character search can shortcut memcmps */
- while (e >= p) {
- if (*e == *needle) {
- RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
- }
- e--;
- }
+ if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) != NULL) {
+ RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+ }
RETURN_FALSE;
}
while (e >= p) {
- if (memcmp(e, needle, needle_len) == 0) {
- RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+ if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) != NULL) {
+ if (!memcmp(needle, e, needle_len)) {
+ RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
+ }
+ } else {
+ RETURN_FALSE;
}
e--;
}
@@ -1925,6 +1926,11 @@
if ((haystack_len == 0) || (needle_len == 0)) {
RETURN_FALSE;
}
+
+ haystack_dup = estrndup(haystack, haystack_len);
+ php_strtolower(haystack_dup, haystack_len);
+ needle_dup = estrndup(needle, needle_len);
+ php_strtolower(needle_dup, needle_len);
if (needle_len == 1) {
/* Single character search can shortcut memcmps
@@ -1932,34 +1938,36 @@
if (offset >= 0) {
if (offset > haystack_len) {
php_error_docref(NULL TSRMLS_CC,
E_NOTICE, "Offset is greater than the length of haystack string");
+ efree(needle_dup);
+ efree(haystack_dup);
RETURN_FALSE;
}
- p = haystack + offset;
- e = haystack + haystack_len - 1;
+ p = haystack_dup + offset;
+ e = haystack_dup + haystack_len - 1;
} else {
- p = haystack;
+ p = haystack_dup;
if (-offset > haystack_len || offset < -
INT_MAX) {
php_error_docref(NULL TSRMLS_CC,
E_NOTICE, "Offset is greater than the length of haystack string");
+ efree(needle_dup);
+ efree(haystack_dup);
RETURN_FALSE;
}
- e = haystack + haystack_len + offset;
+ e = haystack_dup + haystack_len + offset;
}
- /* Borrow that ord_needle buffer to avoid repeatedly
tolower()ing needle */
- *ord_needle = tolower(*needle);
+
while (e >= p) {
- if (tolower(*e) == *ord_needle) {
+ if (*e == *needle_dup) {
+ efree(needle_dup);
+ efree(haystack_dup);
RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
}
e--;
}
+ efree(needle_dup);
+ efree(haystack_dup);
RETURN_FALSE;
}
- needle_dup = estrndup(needle, needle_len);
- php_strtolower(needle_dup, needle_len);
- haystack_dup = estrndup(haystack, haystack_len);
- php_strtolower(haystack_dup, haystack_len);
-
if (offset >= 0) {
if (offset > haystack_len) {
efree(needle_dup);
@@ -1985,11 +1993,13 @@
}
while (e >= p) {
- if (memcmp(e, needle_dup, needle_len) == 0) {
- efree(haystack_dup);
- efree(needle_dup);
- RETURN_LONG(e - p + (offset > 0 ? offset : 0));
- }
+ if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) != NULL) {
+ if (!memcmp(needle, e, needle_len)) {
+ efree(haystack_dup);
+ efree(needle_dup);
+ RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
+ }
+ }
e--;
}
----------------------------------------------
Tested and PASS. Valgrind does not show any memory leaks, but if
somebody more competent can confirm it, than I will really appreciate
it.
Cheers,
Michal
PS If somebody can take a look at that: http://212.85.117.53/gsoc/
index.php?option=com_content&view=article&id=52:small-
buginconsistence&catid=38:other&Itemid=58 it would be also great. If
it should be the same in both functions than I have patch and test
cases for it prepared.