>The code won't compile but it will if references to
>classes which have nothing to do with the pattern
>matching are removed. If this is no good I'll isolate
>the ORO code for you but it will have to be later,
>just ask.

There's too much extraneous stuff in there for me to spend the time
working through it.  Whenever you have the time to whittle it down
to the bare essentials, please post again.  In the meantime,
independent of the ultimate solution, I recommend you not
instantiate a new compiler and matcher in parse() every time
and that you compile SERVER_ACTIVE_HTML_PATTERN every time parse()
is called.  You should only compile SERVER_ACTIVE_HTML_PATTERN
once and reuse the pattern, otherwise, you are wasting cycles.
Likewise, you should only instantiate the matcher once and use
it in parse() as needed, otherwise you are again wasting cycles,
this time in object creation.  The normal way of doing this is to
make the compiled pattern a static variable (compiled with READ_ONLY_MASK
if it is to be shared between threads) compiled in a static initializer
and to make the matcher non-static unless multiple instances of the
class will not be used in different threads, in which case making it
static will do fine.  Finally, the timings must be around individual
calls to contains() in order to determine if it is the matching that
is consuming the time (change loop to while(true) {...} and time
around matcher.contains(), breaking if the result is false and recording
each match time in addition to the total for all calls).  There's 
a lot of other stuff going on in MarkedUpHTML() and parse() that is
contributing to execution time (forget about the 3,000 seconds;
12 seconds alone is way too much to find 4 matches in an HTML page).
All that said, the culprit is still probably a suboptimal regular
expression, as you suspected, but it helps to eliminate these other
factors that perturb the measurements and our ability to isolate the
behavior.

daniel


Reply via email to