Re: DNS-over-TLS IPv4 interface ceases to respond

2018-07-31 Thread Guillaume-Jean Herbiet via Unbound-users
Thanks for the quick reply.

We build our own kernels for those servers (currently using 4.16.13), so
it is rather recent (compared to 3.10 from CentOS).

I will disable so-reuseport, maintain extra logging on this server and
see if it happens again.

On 2018-07-31 09:46, Wouter Wijngaards via Unbound-users wrote:
> Hi,
> 
> On 07/31/2018 09:07 AM, Guillaume-Jean Herbiet via Unbound-users wrote:
>> Hello,
>>
>> We are using Unbound 1.7.3 to test the DNS-over-TLS service and advance
>> options (see specifications and config file below).
>>
>> The server is generally on very low use (avg. 2 queries/s) but
>> configured following the optimization guide[1] in order to test options
>> and perform stress tests.
>>
>> During two low use periods (07/15 and 07/23), we experienced an outage
>> in the IPv4 DNS-over-TLS service. The server was still responsive on
>> IPv6 and when queried locally using "traditional" DNS.
> 
> The so-reuseport option has in the past, for certain kernel versions
> (but I don't know which) given trouble with TCP connections,
> specifically with IPv6 connections.  Since you have both IPv6 and IPv4
> and IPv4 ceases, it looks like a similar issue, even though it is the
> other protocol and not in NSD.  Then, a solution was to turn off
> so-reuseport (and they were so burned by the trouble they haven't dared
> enable it again).  It may be for you too; if that solves the problem,
> perhaps a kernel upgrade can fix it?  Perhaps the server should test
> (something. a version?) before it enables so-reuseport?
> 
> However, it could be something else, but I don't know what.
> 
> Best regards, Wouter
> 
>>
>> Restarting the server momentarily solved the issue.
>>
>> As it happened during my holidays, I had little investigation
>> possibilities. I just confirmed the (un)responsiveness using ncat:
>>
>> $ ncat -6 --ssl -v  853
>> Ncat: Version 7.70 ( https://nmap.org/ncat )
>> Ncat: SSL connection to :853. Fondation RESTENA
>> Ncat: SHA-1 fingerprint: ...
>> ^C
>>
>> $ ncat -4 --ssl -v  853
>> Ncat: Version 7.70 ( https://nmap.org/ncat )
>> ^C
>>
>> Unfortunately, log verbosity was set to 1 an I didn't see anything
>> suspicious. It looked like Unbound was not even receiving the queries on
>> IPv4.
>>
>> Did anyone already noticed such a problem? I wonder whether it is
>> related to Unbound or the underlying OpenSSL.
>>
>> $ unbound -h
>> Version 1.7.3
>> linked libs: libevent 2.0.21-stable (it uses epoll), OpenSSL 1.0.2k-fips
>>  26 Jan 2017
>> linked modules: dns64 respip validator iterator
>>
>> Unbound configuration:
>> server:
>>   directory: "/usr/local/unbound"
>>   chroot: "/usr/local/unbound"
>>   username: unbound
>>   pidfile: "/var/run/unbound.pid"
>>
>>   auto-trust-anchor-file: "/var/lib/root.key"
>>
>>   num-threads: 1
>>
>>   msg-cache-slabs: 2
>>   rrset-cache-slabs: 2
>>   infra-cache-slabs: 2
>>   key-cache-slabs: 2
>>
>>   rrset-cache-size: 100m
>>   msg-cache-size: 50m
>>
>>   outgoing-range: 8192
>>   num-queries-per-thread: 4096
>>
>>   so-rcvbuf: 4m
>>   so-sndbuf: 4m
>>   so-reuseport: yes
>>
>>   interface: 127.0.0.1
>>   interface: ::1
>>
>>   # DNS-over-TLS
>>   tls-service-key: "/usr/local/unbound/etc/dns_over_tls.key"
>>   tls-service-pem: "/usr/local/unbound/etc/dns_over_tls.pem"
>>   incoming-num-tcp: 100
>>
>>   tls-port: 853
>>   interface: 127.0.0.1@853
>>   interface: ::1@853
>>   interface: @853
>>   interface: @853
>>
>>   access-control: 0.0.0.0/0 allow
>>   access-control: ::/0 allow
>>
>>   prefer-ip6: yes
>>
>>   hide-identity: yes
>>   hide-version: yes
>>   hide-trustanchor: yes
>>
>>   use-caps-for-id: yes
>>   qname-minimisation: yes
>>
>>   harden-below-nxdomain: yes
>>   harden-dnssec-stripped: yes
>>
>>   aggressive-nsec: yes
>>
>>   prefetch: yes
>>   prefetch-key: yes
>>
>>   rrset-roundrobin: yes
>>
>>   ratelimit: 1000
>>   ratelimit-slabs: 2
>>
>>   logfile: "/var/log/unbound.log"
>>   verbosity: 1
>>   log-time-ascii: yes
>>   log-queries: yes
>>   log-replies: yes
>>   val-log-level: 2
>>   unwanted-reply-threshold: 1000
>>
>>   statistics-interval: 0
>>   extended-statistics: yes
>>
>> # RFC 7706
>> # master are sorted by increasing AXFR response time
>> auth-zone:
>>   name: "."
>>   for-downstream: no
>>   for-upstream: yes
>>   fallback-enabled: yes
>>   master: c.root-servers.net
>>   master: f.root-servers.net
>>   master: k.root-servers.net
>>   master: iad.xfr.dns.icann.org
>>   master: b.root-servers.net
>>   master: d.root-servers.net
>>   master: lax.xfr.dns.icann.org
>>   master: g.root-servers.net
>>
>> remote-control:
>>   control-enable: yes
>>
>> [1] https://nlnetlabs.nl/documentation/unbound/howto-optimise/
>>
> 

-- 
Guillaume-Jean Herbiet, PhD
System engineer

Fondation RESTENA / dns.lu
2, avenue de l'Université
L-4365 Esch-sur-Alzette
tel.: +352.424409
fax.: +352.422473
https://www.restena.lu  https://www.dns.lu

Public key ID: 0x3A4C47C7



signature.asc
Description: OpenPGP digital signature


Re: DNS-over-TLS IPv4 interface ceases to respond

2018-07-31 Thread Wouter Wijngaards via Unbound-users
Hi,

On 07/31/2018 09:07 AM, Guillaume-Jean Herbiet via Unbound-users wrote:
> Hello,
> 
> We are using Unbound 1.7.3 to test the DNS-over-TLS service and advance
> options (see specifications and config file below).
> 
> The server is generally on very low use (avg. 2 queries/s) but
> configured following the optimization guide[1] in order to test options
> and perform stress tests.
> 
> During two low use periods (07/15 and 07/23), we experienced an outage
> in the IPv4 DNS-over-TLS service. The server was still responsive on
> IPv6 and when queried locally using "traditional" DNS.

The so-reuseport option has in the past, for certain kernel versions
(but I don't know which) given trouble with TCP connections,
specifically with IPv6 connections.  Since you have both IPv6 and IPv4
and IPv4 ceases, it looks like a similar issue, even though it is the
other protocol and not in NSD.  Then, a solution was to turn off
so-reuseport (and they were so burned by the trouble they haven't dared
enable it again).  It may be for you too; if that solves the problem,
perhaps a kernel upgrade can fix it?  Perhaps the server should test
(something. a version?) before it enables so-reuseport?

However, it could be something else, but I don't know what.

Best regards, Wouter

> 
> Restarting the server momentarily solved the issue.
> 
> As it happened during my holidays, I had little investigation
> possibilities. I just confirmed the (un)responsiveness using ncat:
> 
> $ ncat -6 --ssl -v  853
> Ncat: Version 7.70 ( https://nmap.org/ncat )
> Ncat: SSL connection to :853. Fondation RESTENA
> Ncat: SHA-1 fingerprint: ...
> ^C
> 
> $ ncat -4 --ssl -v  853
> Ncat: Version 7.70 ( https://nmap.org/ncat )
> ^C
> 
> Unfortunately, log verbosity was set to 1 an I didn't see anything
> suspicious. It looked like Unbound was not even receiving the queries on
> IPv4.
> 
> Did anyone already noticed such a problem? I wonder whether it is
> related to Unbound or the underlying OpenSSL.
> 
> $ unbound -h
> Version 1.7.3
> linked libs: libevent 2.0.21-stable (it uses epoll), OpenSSL 1.0.2k-fips
>  26 Jan 2017
> linked modules: dns64 respip validator iterator
> 
> Unbound configuration:
> server:
>   directory: "/usr/local/unbound"
>   chroot: "/usr/local/unbound"
>   username: unbound
>   pidfile: "/var/run/unbound.pid"
> 
>   auto-trust-anchor-file: "/var/lib/root.key"
> 
>   num-threads: 1
> 
>   msg-cache-slabs: 2
>   rrset-cache-slabs: 2
>   infra-cache-slabs: 2
>   key-cache-slabs: 2
> 
>   rrset-cache-size: 100m
>   msg-cache-size: 50m
> 
>   outgoing-range: 8192
>   num-queries-per-thread: 4096
> 
>   so-rcvbuf: 4m
>   so-sndbuf: 4m
>   so-reuseport: yes
> 
>   interface: 127.0.0.1
>   interface: ::1
> 
>   # DNS-over-TLS
>   tls-service-key: "/usr/local/unbound/etc/dns_over_tls.key"
>   tls-service-pem: "/usr/local/unbound/etc/dns_over_tls.pem"
>   incoming-num-tcp: 100
> 
>   tls-port: 853
>   interface: 127.0.0.1@853
>   interface: ::1@853
>   interface: @853
>   interface: @853
> 
>   access-control: 0.0.0.0/0 allow
>   access-control: ::/0 allow
> 
>   prefer-ip6: yes
> 
>   hide-identity: yes
>   hide-version: yes
>   hide-trustanchor: yes
> 
>   use-caps-for-id: yes
>   qname-minimisation: yes
> 
>   harden-below-nxdomain: yes
>   harden-dnssec-stripped: yes
> 
>   aggressive-nsec: yes
> 
>   prefetch: yes
>   prefetch-key: yes
> 
>   rrset-roundrobin: yes
> 
>   ratelimit: 1000
>   ratelimit-slabs: 2
> 
>   logfile: "/var/log/unbound.log"
>   verbosity: 1
>   log-time-ascii: yes
>   log-queries: yes
>   log-replies: yes
>   val-log-level: 2
>   unwanted-reply-threshold: 1000
> 
>   statistics-interval: 0
>   extended-statistics: yes
> 
> # RFC 7706
> # master are sorted by increasing AXFR response time
> auth-zone:
>   name: "."
>   for-downstream: no
>   for-upstream: yes
>   fallback-enabled: yes
>   master: c.root-servers.net
>   master: f.root-servers.net
>   master: k.root-servers.net
>   master: iad.xfr.dns.icann.org
>   master: b.root-servers.net
>   master: d.root-servers.net
>   master: lax.xfr.dns.icann.org
>   master: g.root-servers.net
> 
> remote-control:
>   control-enable: yes
> 
> [1] https://nlnetlabs.nl/documentation/unbound/howto-optimise/
> 



signature.asc
Description: OpenPGP digital signature


DNS-over-TLS IPv4 interface ceases to respond

2018-07-31 Thread Guillaume-Jean Herbiet via Unbound-users
Hello,

We are using Unbound 1.7.3 to test the DNS-over-TLS service and advance
options (see specifications and config file below).

The server is generally on very low use (avg. 2 queries/s) but
configured following the optimization guide[1] in order to test options
and perform stress tests.

During two low use periods (07/15 and 07/23), we experienced an outage
in the IPv4 DNS-over-TLS service. The server was still responsive on
IPv6 and when queried locally using "traditional" DNS.

Restarting the server momentarily solved the issue.

As it happened during my holidays, I had little investigation
possibilities. I just confirmed the (un)responsiveness using ncat:

$ ncat -6 --ssl -v  853
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: SSL connection to :853. Fondation RESTENA
Ncat: SHA-1 fingerprint: ...
^C

$ ncat -4 --ssl -v  853
Ncat: Version 7.70 ( https://nmap.org/ncat )
^C

Unfortunately, log verbosity was set to 1 an I didn't see anything
suspicious. It looked like Unbound was not even receiving the queries on
IPv4.

Did anyone already noticed such a problem? I wonder whether it is
related to Unbound or the underlying OpenSSL.

$ unbound -h
Version 1.7.3
linked libs: libevent 2.0.21-stable (it uses epoll), OpenSSL 1.0.2k-fips
 26 Jan 2017
linked modules: dns64 respip validator iterator

Unbound configuration:
server:
  directory: "/usr/local/unbound"
  chroot: "/usr/local/unbound"
  username: unbound
  pidfile: "/var/run/unbound.pid"

  auto-trust-anchor-file: "/var/lib/root.key"

  num-threads: 1

  msg-cache-slabs: 2
  rrset-cache-slabs: 2
  infra-cache-slabs: 2
  key-cache-slabs: 2

  rrset-cache-size: 100m
  msg-cache-size: 50m

  outgoing-range: 8192
  num-queries-per-thread: 4096

  so-rcvbuf: 4m
  so-sndbuf: 4m
  so-reuseport: yes

  interface: 127.0.0.1
  interface: ::1

  # DNS-over-TLS
  tls-service-key: "/usr/local/unbound/etc/dns_over_tls.key"
  tls-service-pem: "/usr/local/unbound/etc/dns_over_tls.pem"
  incoming-num-tcp: 100

  tls-port: 853
  interface: 127.0.0.1@853
  interface: ::1@853
  interface: @853
  interface: @853

  access-control: 0.0.0.0/0 allow
  access-control: ::/0 allow

  prefer-ip6: yes

  hide-identity: yes
  hide-version: yes
  hide-trustanchor: yes

  use-caps-for-id: yes
  qname-minimisation: yes

  harden-below-nxdomain: yes
  harden-dnssec-stripped: yes

  aggressive-nsec: yes

  prefetch: yes
  prefetch-key: yes

  rrset-roundrobin: yes

  ratelimit: 1000
  ratelimit-slabs: 2

  logfile: "/var/log/unbound.log"
  verbosity: 1
  log-time-ascii: yes
  log-queries: yes
  log-replies: yes
  val-log-level: 2
  unwanted-reply-threshold: 1000

  statistics-interval: 0
  extended-statistics: yes

# RFC 7706
# master are sorted by increasing AXFR response time
auth-zone:
  name: "."
  for-downstream: no
  for-upstream: yes
  fallback-enabled: yes
  master: c.root-servers.net
  master: f.root-servers.net
  master: k.root-servers.net
  master: iad.xfr.dns.icann.org
  master: b.root-servers.net
  master: d.root-servers.net
  master: lax.xfr.dns.icann.org
  master: g.root-servers.net

remote-control:
  control-enable: yes

[1] https://nlnetlabs.nl/documentation/unbound/howto-optimise/
-- 
Guillaume-Jean Herbiet, PhD
System engineer

Fondation RESTENA / dns.lu
2, avenue de l'Université
L-4365 Esch-sur-Alzette
tel.: +352.424409
fax.: +352.422473
https://www.restena.lu  https://www.dns.lu

Public key ID: 0x3A4C47C7



signature.asc
Description: OpenPGP digital signature