2010/5/7 Eric Wong <normalper...@yhbt.net>: > Underscore isn't valid for hostnames, but it is allowed in domain names > and most DNS servers will resolve them. I've personally seen websites > with underscores in their domain names in the wild[1].
Hi Eric, could you point me to the spec stating that underscore is allowed for a domain? In the past I've done a SIP parser [*] with Ragel, being 100% strict at BNF grammar, and note that SIP reuses 80% of the grammar of HTTP. I'm pretty sure that "_" is not valid in a domain (host, hostname or whatever). Anyhow it's better just to allow it at parsing level :) > We'll have to test the IPv6 addresses and probably split that out into a > separate regexp since ":" would raise issues with the port number in > existing cases. This is probably something for post-1.0. There is a IETF draft to improve and *fix* the existing BNF grammar for IPv6. It also improves the grammar for IPv4 (by dissallowing values greater than 255): http://tools.ietf.org/html/draft-ietf-sip-ipv6-abnf-fix I've already implemented it in Ragel and I can sure that it's 100% valid and strict (I've done lots of tests): alphanum = ALPHA / DIGIT domainlabel = alphanum | ( alphanum ( alphanum | "-" )* alphanum ); toplabel = ALPHA | ( ALPHA ( alphanum | "-" )* alphanum ); hostname = ( domainlabel "." )* toplabel "."?; dec_octet = DIGIT | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2" 0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 ); IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet; h16 = HEXDIG{1,4}; ls32 = ( h16 ":" h16 ) | IPv4address; IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) | ( h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" )? h16 )? "::" ( h16 ":" ){3} ls32 ) | ( ( ( h16 ":" ){,2} h16 )? "::" ( h16 ":" ){2} ls32 ) | ( ( ( h16 ":" ){,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":" ){,4} h16 )? "::" ls32 ) | ( ( ( h16 ":" ){,5} h16 )? "::" h16 ) | ( ( ( h16 ":" ){,6} h16 )? "::" ); IPv6reference = "[" IPv6address "]"; host = hostname | IPv4address | IPv6reference; port = DIGIT{1,5}; hostport = host ( ":" port )?; This is much better than the deprecated and bogus grammar in RFC 2396 ;) >> ------------------ >> >> host_with_port = (hostname (":" digit*)?) >mark %host; >> >> - It allows something ugly as "mydomain.org:" >> >> I suggest: >> host_with_port = (hostname (":" digit{1,5})?) >mark %host; > > It's ugly, but section 3.2.2 of RFC 2396 appears to allows it. Sometimes there are bugs in the RFC's related to parsing and BNF grammars. I know several cases. Unfortunatelly RFC's cannot be fixed, instead the errors are reported and a new draft or RFC "xxx-fix" appears some years later. >> message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF; >> >> - It doesn't allow valid spaces before ":" as: >> Host : mydomain.org > > Spaces before the ":" aren't allowed in rfc2616, and I have yet to see > evidence of clients sending headers like this in ~4 years of using this > parser. In SIP protocol spaces and tabulators before ":" are allowed, I really expected that in HTTP the same occurs as SIP grammar is based on HTTP grammar. But it could be different in some aspects, of course. >> - Tabulators are also allowed. >> >> I suggest: >> message_header = ((field_name [ \t]* ":" [ \t]* >> field_value)|value_cont) :> CRLF; > > I just pushed this out to unicorn.git to allow horizontal tabs: Thanks. [*] http://dev.sipdoc.net/projects/ragel-sip-parser/wiki/Phase1 -- Iñaki Baz Castillo <i...@aliax.net> _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying