Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 10:56 PM, Ryan Sleevi sle...@google.com wrote: [...] All the forms except for decimal octets are seen as non-standard (despite being quite widely interoperable) and undesirable. They are no longer non-standard, though still non-conforming. Or, in other words, https://url.spec.whatwg.org/ has an IPv4 parser now. -- https://annevankesteren.nl/
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote: The trailing dot actually had meaning, but in my periodic testing most commerce websites didn't handle it well. It didn't help that browsers never favored adding it. On a somewhat (user) hostile network, http://discover.com/ might go to http://discover.com.example.com/ this probably isn't what the user wanted (it certainly wasn't what I wanted when I tested), but using http://discover.com./ got unfortunate redirects or unhappy responses from the remote server. That's all relevant for trailing dots on hostnames; I think the context here is trailing dots on IP addresses, which I don't think have the same meaning, since force this to be treated as a FQDN doesn't really mean anything when you're not doing DNS resolution. I believe for non-IP hostnames, Chrome should be respecting the trailing dot. For IPs, losing the trailing dot seems OK to me. PK
Re: [whatwg] IPv4 parsing
The trailing dot actually had meaning, but in my periodic testing most commerce websites didn't handle it well. It didn't help that browsers never favored adding it. On a somewhat (user) hostile network, http://discover.com/ might go to http://discover.com.example.com/ this probably isn't what the user wanted (it certainly wasn't what I wanted when I tested), but using http://discover.com./ got unfortunate redirects or unhappy responses from the remote server. These days with HSTS, mobile phones, and hopefully some future ubiquitous VPN, we can ignore the risks of hostile local DNS/DHCP servers. Also fun and probably worth documenting is how http://127.1/ and http://127.2.1/ are parsed. I doubt the average developer knows (unless they specifically deal with low level networking). You have http://0.0.0.66/ that's not a match for your example... On Jun 22, 2015 12:52 PM, Anne van Kesteren ann...@annevk.nl wrote: I've done some research into how Chrome parses IPv4 addresses to see if that's worth standardizing. Most browsers do not have special parsing rules for IPv4 vs domain names. That is, they pass the domain name to the network layer and let that figure out what should happen. Typically, that results in a URL such as http://0x42。0./ (note the 。 and trailing .) to end up connecting to IPv4 address 66.0.0.0 with 0x42.0. in the Host header. The resulted parsed URL will be http://0x42.0./. Chrome will instead have 66.0.0.0 in the Host header and its parsed URL will have that value too. That means you lose functionality Host-header wise, but it is more predictable (and no longer depends on the networking stack) where you connect to. That seems somewhat more secure, since it might not be entirely obvious that e.g. http://0x42./ is not a domain name. If the resulting URL is http://0.0.0.66/ it's very clear what is going on. And we don't depend on whatever the OS networking library does. Now, is that what we want? Is losing the trailing dot acceptable? -- https://annevankesteren.nl/
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 1:50 PM, Tim Streater t...@clothears.org.uk wrote: On 24 Jun 2015 at 20:15, Peter Kasting pkast...@google.com wrote: 1.66 = 1.0.0.66 1.256 = 1.0.1.0 1.2.66 = 1.2.0.66 1.256.66 = invalid This makes no sense at all. https://tools.ietf.org/html/draft-main-ipaddr-text-rep-02#section-2.1.1 explains why Quoth the text Meanwhile, a very popular implementation of IP networking went off in its own direction. 4.2BSD introduced a function inet_aton(), whose job was to interpret character strings as IP addresses. It interpreted both of the syntaxes mentioned in [MTP] (see above): a single number giving the entire 32-bit address, and dot-separated octet values. It also interpreted two intermediate syntaxes: octet- dot-octet-dot-16bits, intended for class B addresses, and octet- dot-24bits, intended for class A addresses. It also allowed some flexibility in how the individual numeric parts were specified: it allowed octal and hexadecimal in addition to decimal, distinguishing these radices by using the C language syntax involving a prefix 0 or 0x, and allowed the numbers to be arbitrarily long. The 4.2BSD inet_aton() has been widely copied and imitated, and so is a de facto standard for the textual representation of IPv4 addresses. Nevertheless, these alternative syntaxes have now fallen out of use (if they ever had significant use). The only practical use that they now see is for deliberate obfuscation of addresses: giving an IPv4 address as a single 32-bit decimal number is favoured among people wishing to conceal the true location that is encoded in a URL. All the forms except for decimal octets are seen as non-standard (despite being quite widely interoperable) and undesirable.
Re: [whatwg] IPv4 parsing
On 24 Jun 2015 at 20:15, Peter Kasting pkast...@google.com wrote: How Chrome's omnibox handles this (which I think is compliant with most other places): If there are no dots in the middle of the expression, the number is converted to powers-of-256 format and leading 0s are prepended to reach four octets: 66 = 0.0.0.66 256 = 0.0.1.0 If there are dots in the middle, the number after the last dot is treated as above, while the numbers before the dots must satisfy 0 = n = 255 and are placed into the highest octets in order: 1.66 = 1.0.0.66 1.256 = 1.0.1.0 1.2.66 = 1.2.0.66 1.256.66 = invalid This makes no sense at all. -- Cheers -- Tim
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote: Also fun and probably worth documenting is how http://127.1/ and http://127.2.1/ are parsed. I doubt the average developer knows (unless they specifically deal with low level networking). The question is whether the parsing happens at the URL parser layer or at the network layer. You have http://0.0.0.66/ that's not a match for your example... I'm not sure what you mean here. -- https://annevankesteren.nl/
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 7:21 AM, Anne van Kesteren ann...@annevk.nl wrote: On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote: You have http://0.0.0.66/ that's not a match for your example... I'm not sure what you mean here. You swap between 0.0.0.66 and 66.0.0.0 in your OP. ~TJ
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 9:23 AM, Anne van Kesteren ann...@annevk.nl wrote: On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com wrote: You swap between 0.0.0.66 and 66.0.0.0 in your OP. Actually, the input URL in that case is different. 0x42.0. != 0x42. Well *that's* confusing. ^_^ Def spec this in detail, please. ~TJ
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com wrote: You swap between 0.0.0.66 and 66.0.0.0 in your OP. Actually, the input URL in that case is different. 0x42.0. != 0x42. -- https://annevankesteren.nl/
Re: [whatwg] IPv4 parsing
On Wed, Jun 24, 2015 at 9:37 AM, Tab Atkins Jr. jackalm...@gmail.com wrote: On Wed, Jun 24, 2015 at 9:23 AM, Anne van Kesteren ann...@annevk.nl wrote: On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com wrote: You swap between 0.0.0.66 and 66.0.0.0 in your OP. Actually, the input URL in that case is different. 0x42.0. != 0x42. Well *that's* confusing. ^_^ Def spec this in detail, please. How Chrome's omnibox handles this (which I think is compliant with most other places): If there are no dots in the middle of the expression, the number is converted to powers-of-256 format and leading 0s are prepended to reach four octets: 66 = 0.0.0.66 256 = 0.0.1.0 If there are dots in the middle, the number after the last dot is treated as above, while the numbers before the dots must satisfy 0 = n = 255 and are placed into the highest octets in order: 1.66 = 1.0.0.66 1.256 = 1.0.1.0 1.2.66 = 1.2.0.66 1.256.66 = invalid PK
[whatwg] IPv4 parsing
I've done some research into how Chrome parses IPv4 addresses to see if that's worth standardizing. Most browsers do not have special parsing rules for IPv4 vs domain names. That is, they pass the domain name to the network layer and let that figure out what should happen. Typically, that results in a URL such as http://0x42。0./ (note the 。 and trailing .) to end up connecting to IPv4 address 66.0.0.0 with 0x42.0. in the Host header. The resulted parsed URL will be http://0x42.0./. Chrome will instead have 66.0.0.0 in the Host header and its parsed URL will have that value too. That means you lose functionality Host-header wise, but it is more predictable (and no longer depends on the networking stack) where you connect to. That seems somewhat more secure, since it might not be entirely obvious that e.g. http://0x42./ is not a domain name. If the resulting URL is http://0.0.0.66/ it's very clear what is going on. And we don't depend on whatever the OS networking library does. Now, is that what we want? Is losing the trailing dot acceptable? -- https://annevankesteren.nl/