Thanks very much for your help. In the early stages of developing the pattern I actually used (.*) for my enclosed content but was unaware of SINGLELINE_MASK and so I used the costly alternation instead.
All the best, Janek Bogucki --- "Daniel F. Savarese" <[EMAIL PROTECTED]> wrote: > > In message > <[EMAIL PROTECTED]>, > =?iso-8859-1?q > ?Janek=20Bogucki?= writes: > >Here is a reduced, standalone test class, > >Performance.java. > > Thanks. It's a lot easier to see what the pattern > is now and what > match attempts are taking the most time. This is > the culprit: > > > /* > > * enclosed content > > */ > > "((\\s|.)*)" + > > Alternations are expensive, but much more so when a > subpattern has the > potential to match the entire remaining input on > every backtrack. Instead > of this you should use: > > "(.*)" + > > and compile the pattern with > Perl5Compiler.SINGLELINE_MASK so that > . will match newlines: > > _pattern = > compiler.compile(SERVER_ACTIVE_HTML_PATTERN, > > Perl5Compiler.CASE_INSENSITIVE_MASK | > > Perl5Compiler.SINGLELINE_MASK | > > Perl5Compiler.READ_ONLY_MASK); > > All the matches should now take less than can be > measured with > System.currentTimeMillis(). > > The problem with writing regular expressions is that > you often have > to know something about how they are implemented in > order to avoid > writing inefficient ones. For example, the > expression you used would > be fine with the Awk package because AwkCompiler > will reduce all equivalent > regular expressions to the same DFA (at the price > that it only works with > 8-bit character input). But the Perl package (and > Perl itself) treats > the expressions as NFAs and has to dynamically > search the state space of > possible matches rather than relying on > predetermined state transitions. > "Mastering Regular Expressions" by Jeffrey Friedl > does a pretty good job > of explaining how regular expressions typically work > and how to avoid > writing "slow" expressions. The trick is to avoid > constructs that will > cause a lot of backtracking. > > daniel > > ____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie