Marcus Craske created VALIDATOR-429:
---------------------------------------

             Summary: UrlValidator - path is invalid due to using java.net.URI 
for validation (regression)
                 Key: VALIDATOR-429
                 URL: https://issues.apache.org/jira/browse/VALIDATOR-429
             Project: Commons Validator
          Issue Type: Bug
          Components: Routines
    Affects Versions: 1.6
            Reporter: Marcus Craske


h1. Summary
We've been hit by a bug in a real world application after upgrading 1.4.1 to 
1.6, where previously valid URLs are no longer valid, which looks to be due to 
using java.net.URI for validating the path of a URL.

h2. Steps to Reproduce
Our application went to validate URLs similar to the following:
* http://example.com//_test

This is no longer valid in 1.6.1, but the following cases are:
* http://example.com//test
* http://example.com/_test

h2. Impact
It seems paths in UrlValidator are being parsed/validated as host-names, per 
java.net.URI's validation.

h2. Technical
It looks like this may have been introduced by the following change:
https://github.com/apache/commons-validator/commit/03bf0d33143ebd13e4f389cd4ecac8aec17c2057

Specifically due to now using java.net.URI to validate a path. The usage is as 
follows in org.apache.commons.validator.routines.UrlValidator:
{code}
URI uri = new URI(null,null,path,null);
{code}

It looks like URI is trying to parse the path as a hostname when the schema and 
hostname are not specified.

Example to reproduce:
{code}
new URI(null, null, "//_test", null);   // throws URISyntaxException
{code}

Same example with other parts, no longer throwing exception:
{code}
new URI(null, "test", "//_test", null);
{code}

Even though java.net.URI states string components can be null, it seems the URL 
built internally, which is validated, is slightly different. So when specifying 
a hostname with URI, internally it constructs:
* //test//_test

Using no hostname, in the same way as UrlValidator, the following is 
constructed and validated internally:
* //_test

Therefore it looks like there's either a bug in java.net.URI, or its usage is 
not correctly documented.

h2. Fix
A potential fix is to change org.apache.commons.validator.routines.UrlValidator 
to pass an empty string in the hostname. Internally, in java.net.URI, this 
produces:
* ////_test

Thus the hostname is empty, which is considered empty, and the correct path is 
validated.

Would this fix be appropriate, or considered too fragile?

Alternatively the fix could be to extract similar logic to java.net.URI, to 
validate the path, which appears to be just checking the characters are valid 
and between a certain range. This logic can be seen in 
java.net.URI.parseHierarchical, which calls upon checkChars.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to