>Number:         3160
>Category:       mod_rewrite
>Synopsis:       processing rewrite maps is slow
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          change-request
>Submitter-Id:   apache
>Arrival-Date:   Wed Oct  7 06:40:01 PDT 1998
>Last-Modified:
>Originator:     [EMAIL PROTECTED]
>Organization:
apache
>Release:        1.3.1
>Environment:
SunOS xlink103 5.6 Generic_105181-03 sun4u sparc SUNW,Ultra-2
>Description:
Rewrite maps can be used to modify requested URLs. I am talking
about the txt: and dbm: varieties that are stored in plain text
files or dbm databases.

Every key is searched linearily in a memory array, if it isn't found
then a txt: map is parsed with a regular expression to split
each line into two words. If the key is found there it is appended
to the memory array but if the key is not found the map operation
simply fails.

As a result the txt: map is parsed very often, especially when
there are keys that are not in the map. For large rewrite maps
this can use up a tremendous amount of CPU cycles (with a map
of 3000 entries parsing of the file took about 90% of the CPU).

The situation with dbm: maps is somewhat better as there is
no file that needs to be parsed. Still, nonexistent keys are
searched with slow I/O operations and searching the memory
array linearily can take some significant time.

>How-To-Repeat:

>Fix:
I made three modifications to mod_rewrite:

A txt: file is now parsed with simple string functions instead
of heavy regular expression parsing (thereby fixing a bug that
prevented the usage of keys containing the ',' character).

The memory array now also stores failed lookups (as an empty
string) which is returned as a NULL to the upper layer.

The memory array lookups are cached in a 4-way hash table
with LRU functionality. For small rewrite maps this is slower
though as I use a simple, expensive hash function.

Using a rewrite map for mass-virtual-hosting is now faster
by two orders of magnitude on our servers.

The diffs are available from <[EMAIL PROTECTED]>


>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, ]
[you need to include <[EMAIL PROTECTED]> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]
[If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request ]
[from a developer.                                      ]
[Reply only with text; DO NOT SEND ATTACHMENTS!         ]



Reply via email to