Hello,
Here goes first diff - to keep it simple and avoid confusion I will post them one-by-one after the previous is accepted/rejected.
Optimization of STRRPOS/STRRIPOS for PHP_5_3.

2 things are changed:
* implementation of search loop from theta(NM) to o(N), omega(NM) - it is now the same as original zend_memnstr - using zend_memrchr instead of memchr function. Here is comparison of performance: http:// 212.85.117.53/gsoc/index.php? option=com_content&view=article&id=48:strrpos-patch- proposal&catid=36:patches&Itemid=56 This implementation is faster than original for any reasonable input (a little slower for "aaab" in "aaaaaaaab", but that is really bad case), and I think in this case we should not bother with KMP/BM as I haven't found any call to this functions with haystack longer than 100 (NOTE: on MY server, over ~1h - this can be different for wikipedia guys;))). In case somebody needs KMP - I have it implemented to work "backwards" thus suitable for strrpos.

* second thing is pre-lowering the haystack in strripos in case of needle len=1 - why is explained here: http://212.85.117.53/gsoc/ index.php?option=com_content&view=article&id=49:strripos-possible- patches&catid=36:patches&Itemid=56

Here is the diff:
--------------------------------------------------
diff -u -d -r1.445.2.14.2.69.2.22 string.c
--- ext/standard/string.c 27 May 2008 10:29:33 -0000 1.445.2.14.2.69.2.22
+++ ext/standard/string.c       20 Jun 2008 08:08:34 -0000
@@ -1876,18 +1876,19 @@

        if (needle_len == 1) {
                /* Single character search can shortcut memcmps */
-               while (e >= p) {
-                       if (*e == *needle) {
- RETURN_LONG(e - p + (offset > 0 ? offset : 0));
-                       }
-                       e--;
-               }
+ if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) {
+                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+               }
                RETURN_FALSE;
        }

        while (e >= p) {
-               if (memcmp(e, needle, needle_len) == 0) {
-                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+ if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) {
+                       if (!memcmp(needle, e, needle_len)) {
+ RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+                       }
+               } else {
+                       RETURN_FALSE;
                }
                e--;
        }
@@ -1925,6 +1926,11 @@
        if ((haystack_len == 0) || (needle_len == 0)) {
                RETURN_FALSE;
        }
+
+       haystack_dup = estrndup(haystack, haystack_len);
+       php_strtolower(haystack_dup, haystack_len);
+       needle_dup = estrndup(needle, needle_len);
+       php_strtolower(needle_dup, needle_len);

        if (needle_len == 1) {
                /* Single character search can shortcut memcmps
@@ -1932,34 +1938,36 @@
                if (offset >= 0) {
                        if (offset > haystack_len) {
php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Offset is greater than the length of haystack string");
+                               efree(needle_dup);
+                               efree(haystack_dup);
                                RETURN_FALSE;
                        }
-                       p = haystack + offset;
-                       e = haystack + haystack_len - 1;
+                       p = haystack_dup + offset;
+                       e = haystack_dup + haystack_len - 1;
                } else {
-                       p = haystack;
+                       p = haystack_dup;
if (-offset > haystack_len || offset < - INT_MAX) { php_error_docref(NULL TSRMLS_CC, E_NOTICE, "Offset is greater than the length of haystack string");
+                               efree(needle_dup);
+                               efree(haystack_dup);
                                RETURN_FALSE;
                        }
-                       e = haystack + haystack_len + offset;
+                       e = haystack_dup + haystack_len + offset;
                }
- /* Borrow that ord_needle buffer to avoid repeatedly tolower()ing needle */
-               *ord_needle = tolower(*needle);
+
                while (e >= p) {
-                       if (tolower(*e) == *ord_needle) {
+                       if (*e == *needle_dup) {
+                               efree(needle_dup);
+                               efree(haystack_dup);
RETURN_LONG(e - p + (offset > 0 ? offset : 0));
                        }
                        e--;
                }
+               efree(needle_dup);
+               efree(haystack_dup);
                RETURN_FALSE;
        }

-       needle_dup = estrndup(needle, needle_len);
-       php_strtolower(needle_dup, needle_len);
-       haystack_dup = estrndup(haystack, haystack_len);
-       php_strtolower(haystack_dup, haystack_len);
-
        if (offset >= 0) {
                if (offset > haystack_len) {
                        efree(needle_dup);
@@ -1985,11 +1993,13 @@
        }

        while (e >= p) {
-               if (memcmp(e, needle_dup, needle_len) == 0) {
-                       efree(haystack_dup);
-                       efree(needle_dup);
-                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
-               }
+ if ((e = (char *)zend_memrchr(p, *needle, e - p + 1)) != NULL) {
+                       if (!memcmp(needle, e, needle_len)) {
+                               efree(haystack_dup);
+                               efree(needle_dup);
+ RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+                       }
+               }
                e--;
        }
----------------------------------------------

Tested and PASS. Valgrind does not show any memory leaks, but if somebody more competent can confirm it, than I will really appreciate it.
Cheers,
Michal

PS If somebody can take a look at that: http://212.85.117.53/gsoc/ index.php?option=com_content&view=article&id=52:small- buginconsistence&catid=38:other&Itemid=58 it would be also great. If it should be the same in both functions than I have patch and test cases for it prepared.

Reply via email to