Hello,

 

My company has just upgraded our Apache/mod_perl and supporting software versions, and we are now seeing a strange phenomenon.  A piece of code which has always worked speedily up to this point, now occasionally, but predictably, takes 10 times longer to execute.  The code fragment in question is this.

 

-----

my $quoted_link = qr{

    ((href|action)\s*=\s*)  # matching a link in an href or action attribute ($1)

    (["'])                  # starting link delimiter (' or ") ($3)

    (

      (https?://\w+?\.chr(istian)?book\.com)?   # optional domain name (old or new) ($4)

      /                             # / separating domain from path

      (?![^\3\#\?>]+?\.(exe|sit|pdf|ra?m|mp3|wax|css|js))   # skip non-HTML file types

      [^\3\#\?>]+?                  # anything that can be in a path (no anchors, query strings, etc.

    )                       # everything before the ticket

    (

      [\?\#]                        # start of a query string or anchor reference

      [^\3]+?                       # query string or anchor reference

    )?                      # everything after the ticket ($8)

    \3                      # ending link delimiter

}six;

...

$$text_ref =~ s{$quoted_link}{$1$3$4/$ticket$8$3}gx;

---

The purpose of this code is to tag each URL on a web page with a session ID.  The $text_ref variable is a scalar reference to the page content, and the $ticket variable contains the ID.

 

In an effort to debug why the slow-down was occurring, I replaced the substitution in the snippet above with the following.

---

use Time::HiRes qw(time);

my $start = time;

my $elapsed;

my $count = 0;

print "<p>Text length = " . length( $$text_ref ) . "</p>";

while ( $$text_ref =~ /$quoted_link/g ) {

  $count += 1;

  $elapsed = time - $start;

  print "<p>link_filter2: $count, $& $elapsed</p>";

}

---

This code prints out the cumulative time the regex is taking to match as it encounters links on a web page.  The problem continued to show itself with this change, but I still don't know how to pinpoint why it's happening.

 

Here are some other noteworthy items pertaining to this problem.

 

* This problem only occurs under Apache/mod_perl.  I saved the HTML of a normal page and a slow web page to files, and copied the code above into a small command line script.  When I executed the script against the files, the slow page took about the right

amount of time to process relative to the normal page (as opposed to 10 times as long).  (BTW: the slow page is actually smaller than the normal page, so it should be processed more quickly, which it did in the command line script.)

 

* Our infrastructure upgrade made involved the following software updates.

   -- Before upgrade: Apache 1.3.29, mod_perl 1.27, Perl 5.6.1, Solaris 5.8

   -- After upgrade: Apache 1.3.31, mod_perl 1.29, Perl 5.8.3, Solaris 5.9

 

So, I am wondering...

 

* Can anyone suggest reasons why this code might be executing so slowly?

* Can anyone suggest potential improvements to the regex so it will execute faster?

* Does anyone know of changes between the software versions mentioned above that could lead to this behavior?

 

Thanks in advance for your help.

 

- Simon

 

-----------------------------------------------

Simon Miner

Applications Engineer

Christianbook.com

E: [EMAIL PROTECTED]

T: (978) 573-2233

F: (978) 573-8233

-----------------------------------------------

 

Reply via email to