control: retitle -1 libc6: support for non-compliant nameserver should be 
improved
control: severity -1 wishlist

On 2016-08-12 12:15, Vincent Lefevre wrote:
> On 2016-08-12 09:26:10 +0200, Aurelien Jarno wrote:
> > The libc does a first connection to the configured name server
> > (192.168.0.1) using UDP. Note the size of the packet, very close to
> > the 512 bytes limit without EDNS0 support. This very likely mean the
> > answer is marked as truncated (look at the number of entries in the
> > host answer).
> 
> According to tcpdump output below, there is no truncation: the number
> of A's and AAAA's (10 for each) match what "host keys.gnupg.net"
> gives. BTW, even if there were a truncation, there shouldn't be a
> failure: using of the returned IP addresses would be sufficient to
> connect.

That a wrong assumption. The libc getaddrinfo interface is not to
connect to an IP, but rather to return *all* addresses corresponding to
a query. The returned IPs are not necessarily used for a connection
later. Not returning all addresses so might lead to data loss or
security issue. On example among other is the forward-confirmed reverse
DNS method used for example by some mail servers. Not returning all IPs
might lead to a rejected or a discarded mail depending on the policy.

The point is that the local resolver is supposed to be working
correctly. If it doesn't, one can easily setup a local recursive name
server like unbound.

> 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
> keys.gnupg.net. (32)
> 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ AAAA? 
> keys.gnupg.net. (32)
> 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
> 1.0.168.192.in-addr.arpa. (42)
> 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 
> 0/1/0 (94)
> 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
> 6.0.168.192.in-addr.arpa. (42)
> 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME 
> pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 
> 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 
> 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)

This tcpdump trace doesn't show the answer header, so we don't know if
the truncation flag is set. That said the 11/9/5 says that the answer
contains 11 answer records, 9 name server records and 5 additional
records. This clearly doesn't fit. A normal DNS server would just return
11 answers, so 11/0/0.

That said I just realized that the strace entry in your previous email
contains the beginning of the answer:

> 30419 recvfrom(4, 
> "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
> [16]) = 500

Converted into hexadecimal, this is:
  27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
  73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01

274a is the identification. The flags are 8380 and corresponds to QR,
TC, RD, RA. Your name server clearly says that the answer is truncated.
On a working nameserver, the flags are 8180 for this query, so the same
without the truncation flag.

> > It therefore looks to me like a bug with your network setup, not a
> > libc one.
> 
> Well, though I didn't want that, this is quite a standard network
> setup: my machine just uses DHCP with some standard ADSL modem
> router. And given that many users have similar issues and there
> isn't any problem with Android, I suppose that there's some bug
> on the libc side (or libc can be improved).

Even if it is a quite standard setup, you have to admit it doesn't
behave according to the RFC. You should complain to the manufacturer
and try to get a firmware update.

Trying to workaround things on the libc side just gives even less value
to the RFCs, and encourage selling broken hardware.


> FYI, I also often get 5-second timeouts in name resolution whatever
> the host (you can see it above): I get the answer for A or AAAA, but
> sometimes, the other answer is lost. I have a DHCP hook that tests
> whether I'm using this router:
> 
> [...]
>   ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
>   if grep -i -q $mac /proc/net/arp; then
>     logger "Google Public DNS with TCP to avoid recurrent timeout"
> [...]

This show how broken is your name server. It probably has problem with
AAAA requests. Note that the RFC explicitly allows to not support some
request types (including AAAA ones), but in that case the router must
provide an answer that it doesn't support it and not simply drop it.
You might want to try to workaround this by using "options
single-request" or "options single-request-reopen" in etc/resolv.conf.

In short it cleary shows that the problem comes from the name server and
not the GNU libc:
- the nameserver set the truncation bit
- the nameserver doesn't answer on the TCP port
- the nameserver sometimes drop AAAA queries

With such a broken nameserver, I would advise you to use a local
nameserver like unbound instead.

The GNU libc might be improved to better cope with such broken
nameservers, that say it is at most a wishlist severity and probably a
wontfix as it requires the hardware to develop the workaround. 

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurel...@aurel32.net                 http://www.aurel32.net

Reply via email to