Re: [whatwg] IPv4 parsing

2015-07-01 Thread Anne van Kesteren
On Wed, Jun 24, 2015 at 10:56 PM, Ryan Sleevi sle...@google.com wrote:
[...]  All
the forms except for decimal octets are seen as non-standard (despite
being quite widely interoperable) and undesirable.

They are no longer non-standard, though still non-conforming. Or, in
other words, https://url.spec.whatwg.org/ has an IPv4 parser now.


-- 
https://annevankesteren.nl/


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Peter Kasting
On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote:

 The trailing dot actually had meaning, but in my periodic testing most
 commerce websites didn't handle it well. It didn't help that browsers never
 favored adding it.

 On a somewhat (user) hostile network, http://discover.com/ might go to
 http://discover.com.example.com/ this probably isn't what the user wanted
 (it certainly wasn't what I wanted when I tested), but using
 http://discover.com./ got unfortunate redirects or unhappy responses from
 the remote server.


That's all relevant for trailing dots on hostnames; I think the context
here is trailing dots on IP addresses, which I don't think have the same
meaning, since force this to be treated as a FQDN doesn't really mean
anything when you're not doing DNS resolution.  I believe for non-IP
hostnames, Chrome should be respecting the trailing dot.

For IPs, losing the trailing dot seems OK to me.

PK


Re: [whatwg] IPv4 parsing

2015-06-24 Thread timeless
The trailing dot actually had meaning, but in my periodic testing most
commerce websites didn't handle it well. It didn't help that browsers never
favored adding it.

On a somewhat (user) hostile network, http://discover.com/ might go to
http://discover.com.example.com/ this probably isn't what the user wanted
(it certainly wasn't what I wanted when I tested), but using
http://discover.com./ got unfortunate redirects or unhappy responses from
the remote server.

These days with HSTS, mobile phones, and hopefully some future ubiquitous
VPN, we can ignore the risks of hostile local DNS/DHCP servers.

Also fun and probably worth documenting is how http://127.1/ and
http://127.2.1/ are parsed. I doubt the average developer knows (unless
they specifically deal with low level networking).

You have http://0.0.0.66/ that's not a match for your example...
On Jun 22, 2015 12:52 PM, Anne van Kesteren ann...@annevk.nl wrote:

 I've done some research into how Chrome parses IPv4 addresses to see
 if that's worth standardizing.

 Most browsers do not have special parsing rules for IPv4 vs domain
 names. That is, they pass the domain name to the network layer and
 let that figure out what should happen. Typically, that results in a
 URL such as http://0x42。0./ (note the 。 and trailing .) to end up
 connecting to IPv4 address 66.0.0.0 with 0x42.0. in the Host header.
 The resulted parsed URL will be http://0x42.0./.

 Chrome will instead have 66.0.0.0 in the Host header and its parsed
 URL will have that value too. That means you lose functionality
 Host-header wise, but it is more predictable (and no longer depends on
 the networking stack) where you connect to. That seems somewhat more
 secure, since it might not be entirely obvious that e.g. http://0x42./
 is not a domain name. If the resulting URL is http://0.0.0.66/ it's
 very clear what is going on. And we don't depend on whatever the OS
 networking library does.

 Now, is that what we want? Is losing the trailing dot acceptable?


 --
 https://annevankesteren.nl/



Re: [whatwg] IPv4 parsing

2015-06-24 Thread Ryan Sleevi
On Wed, Jun 24, 2015 at 1:50 PM, Tim Streater t...@clothears.org.uk wrote:

 On 24 Jun 2015 at 20:15, Peter Kasting pkast...@google.com wrote:

  1.66 = 1.0.0.66
  1.256 = 1.0.1.0
  1.2.66 = 1.2.0.66
  1.256.66 = invalid

 This makes no sense at all.


https://tools.ietf.org/html/draft-main-ipaddr-text-rep-02#section-2.1.1
explains why

Quoth the text

Meanwhile, a very popular implementation of IP networking went off in
its own direction.  4.2BSD introduced a function inet_aton(), whose
job was to interpret character strings as IP addresses.  It
interpreted both of the syntaxes mentioned in [MTP] (see above): a
single number giving the entire 32-bit address, and dot-separated
octet values.  It also interpreted two intermediate syntaxes: octet-
dot-octet-dot-16bits, intended for class B addresses, and octet-
dot-24bits, intended for class A addresses.  It also allowed some
flexibility in how the individual numeric parts were specified: it
allowed octal and hexadecimal in addition to decimal, distinguishing
these radices by using the C language syntax involving a prefix 0
or 0x, and allowed the numbers to be arbitrarily long.
The 4.2BSD inet_aton() has been widely copied and imitated, and so is
a de facto standard for the textual representation of IPv4 addresses.
Nevertheless, these alternative syntaxes have now fallen out of use
(if they ever had significant use).  The only practical use that they
now see is for deliberate obfuscation of addresses: giving an IPv4
address as a single 32-bit decimal number is favoured among people
wishing to conceal the true location that is encoded in a URL.  All
the forms except for decimal octets are seen as non-standard (despite
being quite widely interoperable) and undesirable.


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Tim Streater
On 24 Jun 2015 at 20:15, Peter Kasting pkast...@google.com wrote: 

 How Chrome's omnibox handles this (which I think is compliant with most
 other places):

 If there are no dots in the middle of the expression, the number is
 converted to powers-of-256 format and leading 0s are prepended to reach
 four octets:

 66 = 0.0.0.66
 256 = 0.0.1.0

 If there are dots in the middle, the number after the last dot is treated
 as above, while the numbers before the dots must satisfy 0 = n = 255 and
 are placed into the highest octets in order:

 1.66 = 1.0.0.66
 1.256 = 1.0.1.0
 1.2.66 = 1.2.0.66
 1.256.66 = invalid

This makes no sense at all.

--
Cheers  --  Tim


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Anne van Kesteren
On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote:
 Also fun and probably worth documenting is how http://127.1/ and
 http://127.2.1/ are parsed. I doubt the average developer knows (unless they
 specifically deal with low level networking).

The question is whether the parsing happens at the URL parser layer or
at the network layer.


 You have http://0.0.0.66/ that's not a match for your example...

I'm not sure what you mean here.


-- 
https://annevankesteren.nl/


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Tab Atkins Jr.
On Wed, Jun 24, 2015 at 7:21 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Wed, Jun 24, 2015 at 3:46 AM, timeless timel...@gmail.com wrote:
 You have http://0.0.0.66/ that's not a match for your example...

 I'm not sure what you mean here.

You swap between 0.0.0.66 and 66.0.0.0 in your OP.

~TJ


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Tab Atkins Jr.
On Wed, Jun 24, 2015 at 9:23 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 You swap between 0.0.0.66 and 66.0.0.0 in your OP.

 Actually, the input URL in that case is different. 0x42.0. != 0x42.

Well *that's* confusing. ^_^  Def spec this in detail, please.

~TJ


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Anne van Kesteren
On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 You swap between 0.0.0.66 and 66.0.0.0 in your OP.

Actually, the input URL in that case is different. 0x42.0. != 0x42.


-- 
https://annevankesteren.nl/


Re: [whatwg] IPv4 parsing

2015-06-24 Thread Peter Kasting
On Wed, Jun 24, 2015 at 9:37 AM, Tab Atkins Jr. jackalm...@gmail.com
wrote:

 On Wed, Jun 24, 2015 at 9:23 AM, Anne van Kesteren ann...@annevk.nl
 wrote:
  On Wed, Jun 24, 2015 at 9:06 AM, Tab Atkins Jr. jackalm...@gmail.com
 wrote:
  You swap between 0.0.0.66 and 66.0.0.0 in your OP.
 
  Actually, the input URL in that case is different. 0x42.0. != 0x42.

 Well *that's* confusing. ^_^  Def spec this in detail, please.


How Chrome's omnibox handles this (which I think is compliant with most
other places):

If there are no dots in the middle of the expression, the number is
converted to powers-of-256 format and leading 0s are prepended to reach
four octets:

66 = 0.0.0.66
256 = 0.0.1.0

If there are dots in the middle, the number after the last dot is treated
as above, while the numbers before the dots must satisfy 0 = n = 255 and
are placed into the highest octets in order:

1.66 = 1.0.0.66
1.256 = 1.0.1.0
1.2.66 = 1.2.0.66
1.256.66 = invalid

PK


[whatwg] IPv4 parsing

2015-06-22 Thread Anne van Kesteren
I've done some research into how Chrome parses IPv4 addresses to see
if that's worth standardizing.

Most browsers do not have special parsing rules for IPv4 vs domain
names. That is, they pass the domain name to the network layer and
let that figure out what should happen. Typically, that results in a
URL such as http://0x42。0./ (note the 。 and trailing .) to end up
connecting to IPv4 address 66.0.0.0 with 0x42.0. in the Host header.
The resulted parsed URL will be http://0x42.0./.

Chrome will instead have 66.0.0.0 in the Host header and its parsed
URL will have that value too. That means you lose functionality
Host-header wise, but it is more predictable (and no longer depends on
the networking stack) where you connect to. That seems somewhat more
secure, since it might not be entirely obvious that e.g. http://0x42./
is not a domain name. If the resulting URL is http://0.0.0.66/ it's
very clear what is going on. And we don't depend on whatever the OS
networking library does.

Now, is that what we want? Is losing the trailing dot acceptable?


-- 
https://annevankesteren.nl/