Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> there's a regex that makes tests quite slow in PatternTestTools.java:
>
>    CHAR_SET_PATTERN =
> Pattern.compile("(\\(\\?-i\\))?.*(?<!\\\\)\\[^?([^\\]]+)\\]")
>
> I don't fully understand it, does it need to be that complicated? If I
> simplify it like this:
>
>    CHAR_SET_PATTERN = Pattern.compile("\\[^?([^\\]]+)\\]");
>
> The tests become much faster (45 second -> 8 seconds for Polish when
> running just the disambiguation tests).
>
> Regards
>   Daniel

Hi Daniel

I have not had the time to look at what this regexp is used for,
but glancing at the regexp, I see that it contains a  zero-width negative
lookbehind, i.e. the (?<!…) part.  This can be very slow I slow I think.
At least in Vim, regexp, zero-with lookbehind are documented as very
slow (see  :help \@<!  in Vim).  I suspect that it's the same for other
regexp engines.  Perhaps the regexp can be written in such a way to
avoid the zero-width lookbehind.

Regards
Dominique

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to