> https://lab.nexedi.com/kirr/iproute2/blob/bd480e66/t/rtcache-torture
> (also attached to this email)
>
> which reproduces the problem in several minutes just on one computer and
> retested it locally: I can reliably reproduce the issue on pristine
> Debian 3.16.7-ckt25-2 (on both Atom and Core2 notebooks) and on pristine
> 3.16.35 on Atom (compiled by me, since Debian kernel team has not yet
> uploaded 3.16.35 to Jessie).
I have been running this script on four different machines for hours
now without reproducing your bug on the 4.4 or later kernels. It does
trigger on a 3.14 kernel. (it helps to do a killall fping6 before
exiting!)
It does not seem to be happening on 4.4 or later. At one level, I'm
relieved - one last babel bug to worry about in openwrt (now 4.4
based), although one of the platforms I work on is still stuck at
3.18, as is the 3.14 c2 (for now).
At another level I still really, really, really wanted atomic updates
in babel, and was clearing the decks to make a run at the right
netlink stuff when I'd decided to confirm your bug existed or not in
my kernels. :(. Weirdly demotivating.
d@dancer:~/bin$ ssh root@pi3 uname -a
Linux pi3 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d@dancer:~/bin$ ssh root@pi2 uname -a
Linux pi2 4.4.12-v7+ #892 SMP Thu Jun 2 15:41:19 BST 2016 armv7l GNU/Linux
d@dancer:~/bin$ uname -a
Linux dancer 4.5.0-rc7-fqfi #1 SMP PREEMPT Mon Mar 7 16:04:17 PST 2016
x86_64 x86_64 x86_64 GNU/Linux
...
The odroid C2 has the bug.
d@dancer:~/bin$ ssh root@c2 uname -a
Linux c2 3.14.29-56 #1 SMP PREEMPT Wed Apr 20 12:15:54 BRT 2016
aarch64 aarch64 aarch64 GNU/Linux
BUG: Got unexpected unreachable route for 2226:::::1: #
I'd changed the number
unreachable 2226:::::1 from :: dev lo src fd99::2 metric
0 \cache error -101
route table for root 2226::::/48
8<
unicast 2226:::::/64 dev dum0 proto boot scope global metric 1024
unreachable 2226::::/48 dev lo proto boot scope global
metric 1024 error -101
8<
route for 2226:::::1 (once again)
unreachable 2226:::::1 from :: dev lo src fd99::2 metric
0 \cache error -101 users 1 used 3
>
> It is always the same: the issue reproduces reliably in several minutes.
> And it looks like e.g.
>
> - 8<
> root@mini:/home/kirr/src/tools/net/iproute2/t# time ./rtcache-torture
> PING :::::1(:::::1) 56 data bytes
> E.E.E.E..E..EE...E..
>
>
> BUG: Linux mini 3.16.35-mini64 #14 SMP PREEMPT Sun Jun 12 19:41:09 MSK
> 2016 x86_64 GNU/Linux
> BUG: Got unexpected unreachable route for :::::1:
> unreachable :::::1 from :: dev lo src
> 2001:67c:1254:20::1 metric 0 \cache error -101
>
> route table for root ::::/48
> 8<
> unicast :::::/64 dev dum0 proto boot scope global
> metric 1024
> unreachable ::::/48 dev lo proto boot scope global metric
> 1024 error -101
> 8<
>
> route for :::::1 (once again)
> unreachable :::::1 from :: dev lo src
> 2001:67c:1254:20::1 metric 0 \cache error -101 users 1 used 4
>
> real0m49.938s
> user0m4.488s
> sys 0m5.872s
> 8<
>
> The issue should not show itself with kernels >= 4.2, because there the
> lookup procedure does not take table lock twice, and /128 cache entries
> are not routinely created (they are created only upon PMTU exception).
>
> I'm running Debian testing on my development machine. Currently it has
> 4.5.5-1 (2016-05-29). I can confirm that /128 route cache entries are
> not created there just because a route was looked up.
>
> Kirill
>
>
> 8< (rtcache-torture)
> #!/bin/sh -e
> # torture for IPv6 RT cache, trying to hit the race between lookup,cache-add
> & route add
> # http://lists.alioth.debian.org/pipermail/babel-users/2016-June/002547.html
>
>
> tprefix=:: # "whole-network" prefix for tests /48
> tsubnet=$tprefix: # subnetwork for which "to" route will be changed
> /64
> taddr=$tsubnet::1 # test address on $tsubnet
>
> # setup for tests:
>
> # dum0 dummy device
> ip link del dev dum0 2>/dev/null || :
> ip link add dum0 type dummy
> ip link set up dev dum0
>
> # clean route table for tprefix with only unreachable whole-network route
> ip -6 route flush root $tprefix::/48
> ip -6 route add unreachable $tprefix::/48
> ip -6 route flush cache
>
> ip -6 route add $tsubnet::/64 dev dum0
>
>
> # put a lot of requests to rt/rtcache getting route to $taddr
> trap 'kill $(jobs -p)' EXIT
> rtgetter() {
> # NOTE we cannot do this with `ip route get ...` in a loop, as `ip route
> # get` first takes RTNL lock, and thus will be completely serialized with
> # e.g. route add and del.
> #
>