Hi,

Both patches seem ok to me. Please ask for a CVS account and prepare yourself to commit (when your mentor says so) :) In the second patch maybe you could use zend_memrchr() as well, but you should get the same results. Also I think that bundling the backwards KMP would be a nice addition. People use PHP for other things than regular web scripting..

Nuno


----- Original Message ----- From: "Michal Dziemianko" <[EMAIL PROTECTED]>
To: <internals@lists.php.net>
Sent: Friday, June 20, 2008 9:36 AM
Subject: Re: [PHP-DEV] Algorithm Optimizations - string search


Hello,
Here goes first diff - to keep it simple and avoid confusion I will
post them one-by-one after the previous is accepted/rejected.
Optimization of STRRPOS/STRRIPOS for PHP_5_3.

2 things are changed:
* implementation of search loop from theta(NM) to o(N), omega(NM) -
it is now the same as original zend_memnstr - using zend_memrchr
instead of memchr function. Here is comparison of performance: http://
212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=48:strrpos-patch-
proposal&catid=36:patches&Itemid=56
This implementation is faster than original for any reasonable input
(a little slower for "aaab" in "aaaaaaaab", but that is really bad
case), and I think in this case we should not bother with KMP/BM as I
haven't found any call to this functions with haystack longer than
100 (NOTE: on MY server, over ~1h - this can be different for
wikipedia guys;))). In case somebody needs KMP - I have it
implemented to work "backwards" thus suitable for strrpos.

* second thing is pre-lowering the haystack in strripos in case of
needle len=1 - why is explained here: http://212.85.117.53/gsoc/
index.php?option=com_content&view=article&id=49:strripos-possible-
patches&catid=36:patches&Itemid=56

Here is the diff:
--------------------------------------------------
diff -u -d -r1.445.2.14.2.69.2.22 string.c
--- ext/standard/string.c       27 May 2008 10:29:33 -0000
1.445.2.14.2.69.2.22
+++ ext/standard/string.c       20 Jun 2008 08:08:34 -0000
@@ -1876,18 +1876,19 @@

        if (needle_len == 1) {
                /* Single character search can shortcut memcmps */
-               while (e >= p) {
-                       if (*e == *needle) {
-                               RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
-                       }
-                       e--;
-               }
+               if ((e = (char *)zend_memrchr(p, *needle,  e - p +
1)) != NULL) {
+                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+               }
                RETURN_FALSE;
        }

        while (e >= p) {
-               if (memcmp(e, needle, needle_len) == 0) {
-                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
+               if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) != NULL) {
+                       if (!memcmp(needle, e, needle_len)) {
+                               RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
+                       }
+               } else {
+                       RETURN_FALSE;
                }
                e--;
        }
@@ -1925,6 +1926,11 @@
        if ((haystack_len == 0) || (needle_len == 0)) {
                RETURN_FALSE;
        }
+
+       haystack_dup = estrndup(haystack, haystack_len);
+       php_strtolower(haystack_dup, haystack_len);
+       needle_dup = estrndup(needle, needle_len);
+       php_strtolower(needle_dup, needle_len);

        if (needle_len == 1) {
                /* Single character search can shortcut memcmps
@@ -1932,34 +1938,36 @@
                if (offset >= 0) {
                        if (offset > haystack_len) {
                                php_error_docref(NULL TSRMLS_CC,
E_NOTICE, "Offset is greater than the length of haystack string");
+                               efree(needle_dup);
+                               efree(haystack_dup);
                                RETURN_FALSE;
                        }
-                       p = haystack + offset;
-                       e = haystack + haystack_len - 1;
+                       p = haystack_dup + offset;
+                       e = haystack_dup + haystack_len - 1;
                } else {
-                       p = haystack;
+                       p = haystack_dup;
                        if (-offset > haystack_len || offset < -
INT_MAX) {
                                php_error_docref(NULL TSRMLS_CC,
E_NOTICE, "Offset is greater than the length of haystack string");
+                               efree(needle_dup);
+                               efree(haystack_dup);
                                RETURN_FALSE;
                        }
-                       e = haystack + haystack_len + offset;
+                       e = haystack_dup + haystack_len + offset;
                }
-               /* Borrow that ord_needle buffer to avoid repeatedly
tolower()ing needle */
-               *ord_needle = tolower(*needle);
+
                while (e >= p) {
-                       if (tolower(*e) == *ord_needle) {
+                       if (*e == *needle_dup) {
+                               efree(needle_dup);
+                               efree(haystack_dup);
                                RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
                        }
                        e--;
                }
+               efree(needle_dup);
+               efree(haystack_dup);
                RETURN_FALSE;
        }

-       needle_dup = estrndup(needle, needle_len);
-       php_strtolower(needle_dup, needle_len);
-       haystack_dup = estrndup(haystack, haystack_len);
-       php_strtolower(haystack_dup, haystack_len);
-
        if (offset >= 0) {
                if (offset > haystack_len) {
                        efree(needle_dup);
@@ -1985,11 +1993,13 @@
        }

        while (e >= p) {
-               if (memcmp(e, needle_dup, needle_len) == 0) {
-                       efree(haystack_dup);
-                       efree(needle_dup);
-                       RETURN_LONG(e - p + (offset > 0 ? offset : 0));
-               }
+               if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) != NULL) {
+                       if (!memcmp(needle, e, needle_len)) {
+                               efree(haystack_dup);
+                               efree(needle_dup);
+                               RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
+                       }
+               }
                e--;
        }
----------------------------------------------

Tested and PASS. Valgrind does not show any memory leaks, but if
somebody more competent can confirm it, than I will really appreciate
it.
Cheers,
Michal

PS If somebody can take a look at that: http://212.85.117.53/gsoc/
index.php?option=com_content&view=article&id=52:small-
buginconsistence&catid=38:other&Itemid=58 it would be also great. If
it should be the same in both functions than I have patch and test
cases for it prepared.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to