[
https://issues.apache.org/jira/browse/VALIDATOR-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622078#comment-13622078
]
Rafał Figas commented on VALIDATOR-318:
---------------------------------------
Hi!
Well, this is pretty unclear how the underscore should be treated, I agree.
I've found this RFC:
http://www.ietf.org/rfc/rfc2181.txt
Which has been issued 10 years after 1034 and 1035 and it states:
{quote}
Several problem areas in the Domain Name System specification
[RFC1034, RFC1035] have been noted through the years [RFC1123]. This
document addresses several additional problem areas.
{quote}
and further:
{quote}
The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators). The zero
length full name is defined as representing the root of the DNS tree,
and is typically written and displayed as ".". Those restrictions
aside, any binary string whatever can be used as the label of any
resource record. Similarly, any binary string can serve as the value
of any record that includes a domain name as some or all of its value
(SOA, NS, MX, PTR, CNAME, and any others that may be added).
Implementations of the DNS protocols must not place any restrictions
on the labels that can be used. In particular, DNS servers must not
refuse to serve a zone because it contains labels that might not be
acceptable to some DNS client programs. A DNS server may be
configurable to issue warnings when loading, or even to refuse to
load, a primary zone containing labels that might be considered
questionable, however this should not happen by default.
Note however, that the various applications that make use of DNS data
can have restrictions imposed on what particular values are
acceptable in their environment. For example, that any binary label
can have an MX record does not imply that any binary name can be used
as the host part of an e-mail address. Clients of the DNS can impose
whatever restrictions are appropriate to their circumstances on the
values they use as keys for DNS lookup requests, and on the values
returned by the DNS. If the client has such restrictions, it is
solely responsible for validating the data from the DNS to ensure
that it conforms before it makes any use of that data.
See also [RFC1123] section 6.1.3.5.
{quote}
So, it seems that in DNS there are very little limitations on characters, so
I've jumped to RFC1123 mentioned above. I've found following statement:
{quote}
The syntax of a legal Internet host name was specified in RFC-952
[DNS:4]. One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
syntax.
Host software MUST handle host names of up to 63 characters and
SHOULD handle host names of up to 255 characters.
Whenever a user inputs the identity of an Internet host, it SHOULD
be possible to enter either (1) a host domain name or (2) an IP
address in dotted-decimal ("#.#.#.#") form. The host SHOULD check
the string syntactically for a dotted-decimal number before
looking it up in the Domain Name System.
{quote}
So that would suggest that as for internet host names underscore is not
allowed, despite the fact that it is allowed as domain name stored in DNS.
BUT as you can see URL shown in bug report
(http://www.nasza_sp77.republika.pl/) exists and works (browsers handle it
correctly), so I am wondering if I overlooked something? Some newer RFC maybe?
> isValid return false for valid URL
> ----------------------------------
>
> Key: VALIDATOR-318
> URL: https://issues.apache.org/jira/browse/VALIDATOR-318
> Project: Commons Validator
> Issue Type: Bug
> Affects Versions: 1.3.1 Release, 1.4.0 Release
> Reporter: Rafał Figas
>
> isValid returns false for following URL:
> http://www.nasza_sp77.republika.pl/
> which seems to be perfectly legal according to RFC:
> http://tools.ietf.org/html/rfc3986#section-2.3
> And a discussions here:
> http://stackoverflow.com/questions/2180465/can-someone-have-a-subdomain-with-an-underscore-in-it
> http://stackoverflow.com/questions/10959757/the-use-of-the-underscore-in-host-names
> Test code:
> {code}
> String schemes[] = { "http", "https" };
> UrlValidator validator = new UrlValidator(schemes, UrlValidator.NO_FRAGMENTS);
> return validator.isValid("http://www.nasza_sp77.republika.pl/");
> {code}
> This is somewhat similar to VALIDATOR-204, but that was for path part of URL
> and this one is for underscore in hostname.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira