On Mon, Mar 16, 2009 at 11:54 AM, Scott Blum <[email protected]> wrote:

> 85: Why the change from [a-zA-Z] to [a-z]?  FWIW, the tests should check
> the validatity of "C:\foo" and "c:\foo".
>

The pattern was already being compared in a case-insensitive manner, so it
was pointless to also handle case here.  In fact, the previous formulation
did rule.toLowerCase(), so it was getting changed to [a-za-z].


> 88: unless I'm missing something, this should read:
> String prefix = "^("
>
> Otherwise, you lose the property of checking only start of the URL.  Think
> "http://evildomain.org/evil?http://localhost:88/";.  There should probably
> also be a test case that this does not pass.
>

>From the documentation of Matcher.matches:

>      * @return  <tt>true</tt> if, and only if, the entire region sequence
>      *          matches this matcher's pattern


Ie, it already includes implicit ^$ around the pattern, which is why .* was
added in various places.

I added your suggested tests to verify it and they do pass.  I also added an
extra () around each individual regex to avoid surprises if someone adds a
regex in the future that needs it.


> Also, have you verified that it's actually faster to use a single regex
> than to check multiples?
>

I have not actually compared this particular implementation, but in my
experience it is always faster to build one DFA based on all the patterns
than to build multple ones and try and match each separately.  In addition,
the old formulation was compiling each pattern every time it was checked, so
I am completely confident this is much faster (not that the number of checks
is going to be significant anyway).

-- 
John A. Tamplin
Software Engineer (GWT), Google

--~--~---------~--~----~------------~-------~--~----~
http://groups.google.com/group/Google-Web-Toolkit-Contributors
-~----------~----~----~----~------~----~------~--~---

Reply via email to