Hi all,

A silly question, is it possible that a ~5Mbps constant stream of NTP
client traffic to an OpenBSD router over an ADSLv2 link, cause said
OpenBSD router to successfully communicate with its internal network?
This will take some explaining, so please bear with me. :-)

A few weeks back I was debugging an issue with an old Advantech
UNO-1150G which uses Realtek Ethernet interfaces, which was dropping off
the network randomly.  After some experiments, there was a strong
suggestion that it was hardware, and an APU2 was purchased to replace
the Advantech box.

I installed OpenBSD 6.3 on that new box¹, set it up with the same
configuration settings as before, and away it went.  Same cron jobs, so
I got an email if for a period, the APU2 lost contact with the internal
network.

The problem continued:
> Sun May 20 04:13:53 AEST 2018
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0d:b9:4a:f9:f8
>         index 1 priority 0 llprio 3
>         media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>         status: active
>         inet 172.31.249.254 netmask 0xffffff00 broadcast 172.31.249.255
>         inet6 fe80::757c:2f2f:fa68:93c%em0 prefixlen 64 scopeid 0x1
>         inet6 2001:44b8:21ac:70f9::fe prefixlen 64
> Name    Mtu   Network     Address              Ipkts Ierrs    Opkts Oerrs 
> Colls
> em0     1500  <Link>      00:0d:b9:4a:f9:f8  2261254     0  3642705     0     > 0
> em0     1500  172.31.249/ 172.31.249.254     2261254     0  3642705     0     > 0
> em0     1500  fe80::%em0/ fe80::757c:2f2f:f  2261254     0  3642705     0     > 0
> em0     1500  2001:44b8:2 2001:44b8:21ac:70  2261254     0  3642705     0     > 0
> Sun May 20 04:14:53 AEST 2018
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0d:b9:4a:f9:f8
>         index 1 priority 0 llprio 3
>         media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>         status: active
>         inet 172.31.249.254 netmask 0xffffff00 broadcast 172.31.249.255
>         inet6 fe80::757c:2f2f:fa68:93c%em0 prefixlen 64 scopeid 0x1
>         inet6 2001:44b8:21ac:70f9::fe prefixlen 64
> Name    Mtu   Network     Address              Ipkts Ierrs    Opkts Oerrs 
> Colls
> em0     1500  <Link>      00:0d:b9:4a:f9:f8  2262020     0  3643633     0     > 0
> em0     1500  172.31.249/ 172.31.249.254     2262020     0  3643633     0     > 0
> em0     1500  fe80::%em0/ fe80::757c:2f2f:f  2262020     0  3643633     0     > 0
> em0     1500  2001:44b8:2 2001:44b8:21ac:70  2262020     0  3643633     0     > 0
> Sun May 20 04:15:53 AEST 2018
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0d:b9:4a:f9:f8
>         index 1 priority 0 llprio 3
>         media: Ethernet autoselect (1000baseT 
> full-duplex,master,rxpause,txpause)
>         status: active
>         inet 172.31.249.254 netmask 0xffffff00 broadcast 172.31.249.255
>         inet6 fe80::757c:2f2f:fa68:93c%em0 prefixlen 64 scopeid 0x1
>         inet6 2001:44b8:21ac:70f9::fe prefixlen 64
> Name    Mtu   Network     Address              Ipkts Ierrs    Opkts Oerrs 
> Colls
> em0     1500  <Link>      00:0d:b9:4a:f9:f8  2262324     0  3643805     0     > 0
> em0     1500  172.31.249/ 172.31.249.254     2262324     0  3643805     0     > 0
> em0     1500  fe80::%em0/ fe80::757c:2f2f:f  2262324     0  3643805     0     > 0
> em0     1500  2001:44b8:2 2001:44b8:21ac:70  2262324     0  3643805     0     > 0

Now, no errors being reported, and the APU2 was *more* stable than the
Advantech box, but the problems still continued.

One might think the blame lays with my hacked-up switch.  I've got a
replacement switch (a Netgear GS748T) which I will install in due
course, but a curious thing happened this month.

The border router here at the time was a participant in the public NTP
server pool (http://pool.ntp.org).  This month, I got an email from my
ISP to say I was over quota.  Sure enough, I had sustained the
equivalent of 50GB/day "downloads" over the course of a week.

When I had a look with `tcpdump` on pppoe0; sure enough, it was NTP
client request traffic.

https://community.ntppool.org/t/excessive-traffic-to-australian-servers/791
shows when this problem started this month.  I'm expecting a whopper of
an Internet bill this month.

I have since pulled my server out of the pool (did this on the 1st this
month), and while I still have some 700-odd residual clients (scattered
about the globe), the significant incoming traffic has ceased, and with
it, so has my "drop-out" problem with the border router.

The cron job I set up in that previous thread has not reported a single
issue with the internal network.  The APU2 has been *rock solid* since then.

I've made modifications to my pf.conf since I pulled out of the pool but
if there's interest, I can boot up the old router and pull the pf.conf
and ntpd.conf off it, since that's pretty much what I was using (with
em0 substituted for rl0 for pf.conf).

My thinking, since the problem has disappeared, is that the sheer number
of clients was overwhelming the router, and as a result, it didn't have
enough buffer space to handle the number of separate hosts requesting
the time from it.

It's highly likely this is some naïve mistake on my part, with
configuring pf.conf, and that with more appropriate rules, the problem
would disappear.

Does anyone had similar issues, or can think of ways to mitigate such
problems?

Regards,
-- 
Stuart Longland (aka Redhatter, VK4MSL)

I haven't lost my mind...
  ...it's backed up on a tape somewhere.

1. dmesg output is here:
https://marc.info/?l=openbsd-misc&m=152672107525374&w=2

Reply via email to