Iñaki Baz Castillo <[email protected]> wrote: > 2010/5/7 Eric Wong <[email protected]>: > > Underscore isn't valid for hostnames, but it is allowed in domain names > > and most DNS servers will resolve them. I've personally seen websites > > with underscores in their domain names in the wild[1]. > > Hi Eric, could you point me to the spec stating that underscore is > allowed for a domain? In the past I've done a SIP parser [*] with > Ragel, being 100% strict at BNF grammar, and note that SIP reuses 80% > of the grammar of HTTP. I'm pretty sure that "_" is not valid in a > domain (host, hostname or whatever). Anyhow it's better just to allow > it at parsing level :)
http://www.ietf.org/rfc/rfc2782.txt Even if it's not part of the RFC, our parser will match reality and accommodate broken things we see in the wild, as it has done in the past: http://mid.gmane.org/20080327215027.ga14...@untitled > > We'll have to test the IPv6 addresses and probably split that out into a > > separate regexp since ":" would raise issues with the port number in > > existing cases. This is probably something for post-1.0. > > There is a IETF draft to improve and *fix* the existing BNF grammar for IPv6. > It also improves the grammar for IPv4 (by dissallowing values greater than > 255): > > http://tools.ietf.org/html/draft-ietf-sip-ipv6-abnf-fix > > > I've already implemented it in Ragel and I can sure that it's 100% > valid and strict (I've done lots of tests): > > alphanum = ALPHA / DIGIT > domainlabel = alphanum | ( alphanum ( alphanum | "-" )* alphanum ); > toplabel = ALPHA | ( ALPHA ( alphanum | "-" )* alphanum ); > hostname = ( domainlabel "." )* toplabel "."?; > dec_octet = DIGIT | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2" > 0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 ); > IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet; > h16 = HEXDIG{1,4}; > ls32 = ( h16 ":" h16 ) | IPv4address; > IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) | > ( h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" )? h16 )? "::" ( h16 > ":" ){3} ls32 ) | ( ( ( h16 ":" ){,2} h16 )? "::" ( h16 ":" ){2} ls32 > ) | ( ( ( h16 ":" ){,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":" > ){,4} h16 )? "::" ls32 ) | ( ( ( h16 ":" ){,5} h16 )? "::" h16 ) | ( ( > ( h16 ":" ){,6} h16 )? "::" ); > IPv6reference = "[" IPv6address "]"; > host = hostname | IPv4address | IPv6reference; > port = DIGIT{1,5}; > hostport = host ( ":" port )?; > > > This is much better than the deprecated and bogus grammar in RFC 2396 ;) Thanks, it might be worth simplifying a bit for readability, simplicity (and possibly performance) at the expense of 100% conformance. -- Eric Wong _______________________________________________ Unicorn mailing list - [email protected] http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying
