[ 
https://issues.apache.org/jira/browse/VALIDATOR-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622078#comment-13622078
 ] 

Rafał Figas commented on VALIDATOR-318:
---------------------------------------

Hi!

Well, this is pretty unclear how the underscore should be treated, I agree.

I've found this RFC:
http://www.ietf.org/rfc/rfc2181.txt

Which has been issued 10 years after 1034 and 1035 and it states:
{quote}
   Several problem areas in the Domain Name System specification
   [RFC1034, RFC1035] have been noted through the years [RFC1123].  This
   document addresses several additional problem areas.
{quote}

and further:
{quote}
   The DNS itself places only one restriction on the particular labels
   that can be used to identify resource records.  That one restriction
   relates to the length of the label and the full name.  The length of
   any one label is limited to between 1 and 63 octets.  A full domain
   name is limited to 255 octets (including the separators).  The zero
   length full name is defined as representing the root of the DNS tree,
   and is typically written and displayed as ".".  Those restrictions
   aside, any binary string whatever can be used as the label of any
   resource record.  Similarly, any binary string can serve as the value
   of any record that includes a domain name as some or all of its value
   (SOA, NS, MX, PTR, CNAME, and any others that may be added).
   Implementations of the DNS protocols must not place any restrictions
   on the labels that can be used.  In particular, DNS servers must not
   refuse to serve a zone because it contains labels that might not be
   acceptable to some DNS client programs.  A DNS server may be
   configurable to issue warnings when loading, or even to refuse to
   load, a primary zone containing labels that might be considered
   questionable, however this should not happen by default.

   Note however, that the various applications that make use of DNS data
   can have restrictions imposed on what particular values are
   acceptable in their environment.  For example, that any binary label
   can have an MX record does not imply that any binary name can be used
   as the host part of an e-mail address. Clients of the DNS can impose
   whatever restrictions are appropriate to their circumstances on the
   values they use as keys for DNS lookup requests, and on the values
   returned by the DNS.  If the client has such restrictions, it is
   solely responsible for validating the data from the DNS to ensure
   that it conforms before it makes any use of that data.

   See also [RFC1123] section 6.1.3.5.
{quote}

So, it seems that in DNS there are very little limitations on characters, so 
I've jumped to RFC1123 mentioned above. I've found following statement:

{quote}
      The syntax of a legal Internet host name was specified in RFC-952
      [DNS:4].  One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal
      syntax.

      Host software MUST handle host names of up to 63 characters and
      SHOULD handle host names of up to 255 characters.

      Whenever a user inputs the identity of an Internet host, it SHOULD
      be possible to enter either (1) a host domain name or (2) an IP
      address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
      the string syntactically for a dotted-decimal number before
      looking it up in the Domain Name System.
{quote}

So that would suggest that as for internet host names underscore is not 
allowed, despite the fact that it is allowed as domain name stored in DNS.

BUT as you can see URL shown in bug report 
(http://www.nasza_sp77.republika.pl/) exists and works (browsers handle it 
correctly), so I am wondering if I overlooked something? Some newer RFC maybe?


                
> isValid return false for valid URL
> ----------------------------------
>
>                 Key: VALIDATOR-318
>                 URL: https://issues.apache.org/jira/browse/VALIDATOR-318
>             Project: Commons Validator
>          Issue Type: Bug
>    Affects Versions: 1.3.1 Release, 1.4.0 Release
>            Reporter: Rafał Figas
>
> isValid returns false for following URL:
> http://www.nasza_sp77.republika.pl/
> which seems to be perfectly legal according to RFC:
> http://tools.ietf.org/html/rfc3986#section-2.3
> And a discussions here:
> http://stackoverflow.com/questions/2180465/can-someone-have-a-subdomain-with-an-underscore-in-it
> http://stackoverflow.com/questions/10959757/the-use-of-the-underscore-in-host-names
> Test code:
> {code}
> String schemes[] = { "http", "https" };
> UrlValidator validator = new UrlValidator(schemes, UrlValidator.NO_FRAGMENTS);
> return validator.isValid("http://www.nasza_sp77.republika.pl/";);
> {code}
> This is somewhat similar to VALIDATOR-204, but that was for path part of URL 
> and this one is for underscore in hostname.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to