>Number: 3160 >Category: mod_rewrite >Synopsis: processing rewrite maps is slow >Confidential: no >Severity: non-critical >Priority: medium >Responsible: apache >State: open >Class: change-request >Submitter-Id: apache >Arrival-Date: Wed Oct 7 06:40:01 PDT 1998 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3.1 >Environment: SunOS xlink103 5.6 Generic_105181-03 sun4u sparc SUNW,Ultra-2 >Description: Rewrite maps can be used to modify requested URLs. I am talking about the txt: and dbm: varieties that are stored in plain text files or dbm databases.
Every key is searched linearily in a memory array, if it isn't found then a txt: map is parsed with a regular expression to split each line into two words. If the key is found there it is appended to the memory array but if the key is not found the map operation simply fails. As a result the txt: map is parsed very often, especially when there are keys that are not in the map. For large rewrite maps this can use up a tremendous amount of CPU cycles (with a map of 3000 entries parsing of the file took about 90% of the CPU). The situation with dbm: maps is somewhat better as there is no file that needs to be parsed. Still, nonexistent keys are searched with slow I/O operations and searching the memory array linearily can take some significant time. >How-To-Repeat: >Fix: I made three modifications to mod_rewrite: A txt: file is now parsed with simple string functions instead of heavy regular expression parsing (thereby fixing a bug that prevented the usage of keys containing the ',' character). The memory array now also stores failed lookups (as an empty string) which is returned as a NULL to the upper layer. The memory array lookups are cached in a 4-way hash table with LRU functionality. For small rewrite maps this is slower though as I use a simple, expensive hash function. Using a rewrite map for mass-virtual-hosting is now faster by two orders of magnitude on our servers. The diffs are available from <[EMAIL PROTECTED]> >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, ] [you need to include <[EMAIL PROTECTED]> in the Cc line ] [and leave the subject line UNCHANGED. This is not done] [automatically because of the potential for mail loops. ] [If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request ] [from a developer. ] [Reply only with text; DO NOT SEND ATTACHMENTS! ]
