W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze:


W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze:


W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze:


W dniu 2017-09-20 o 20:36, Cong Wang pisze:
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote:
but dmesg at this time shows nothing about interfaces or flaps.

This is very odd.

We only free netdevice in free_netdev() and it is only called when
we unregister a netdevice. Otherwise pcpu_refcnt is impossible
to be NULL.
If there is a missing dev_hold() or one dev_put() in excess,
this would allow the netdev to be freed too soon.

-> Use after free.
memory holding netdev could be reallocated-cleared by some other kernel
user.

Sure, but only unregister could trigger a free. If there is no unregister,
like what Pawel claims, then there is no free, the refcnt just goes to
0 but the memory is still there.

About possible mistake from my side with bisect - i can judge too early that some bisect was good
the road was:
git bisect start
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024.
git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure'
git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
# bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2'
git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
# good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow'
git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
# good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing
git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
# bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix
git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
# good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put()

And currently have this running for about 4 hours without problems.



git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
# bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag

Here for sure - panic

git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
# bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly
git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
# good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly
git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
# bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree

im not 100% sure tor last two
Will test them again starting from
[95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly


git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
# bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free()



git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
# first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free()



What i can say more
I can reproduce this on any server with similar configuration
the difference can be teamd instead of bonding
ixgbe or i40e and mlx5
Same problems

vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops




bisected again and same result:
b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang <wei...@google.com>
Date:   Sat Jun 17 10:42:32 2017 -0700

    ipv4: mark DST_NOGC and remove the operation of dst_free()

    With the previous preparation patches, we are ready to get rid of the
    dst gc operation in ipv4 code and release dst based on refcnt only.
    So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
    to dst_free().
    At this point, all dst created in ipv4 code do not use the dst gc
    anymore and will be destroyed at the point when refcnt drops to 0.

    Signed-off-by: Wei Wang <wei...@google.com>
    Acked-by: Martin KaFai Lau <ka...@fb.com>
    Signed-off-by: David S. Miller <da...@davemloft.net>

:040000 040000 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M      net

Will add now version 2 of patch from Eric and we will see


after adding patch
perf top catch
   PerfTop:   77159 irqs/sec  kernel:99.7%  exact:  0.0% [4000Hz cycles],  (all, 40 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    60.95%  [kernel]            [k] dev_put.part.6
     4.00%  [kernel]            [k] ixgbe_poll
     3.63%  [kernel]            [k] irq_entries_start
     1.22%  [kernel]            [k] fib_table_lookup
     1.15%  [kernel]            [k] do_raw_spin_lock
     1.05%  [kernel]            [k] ixgbe_xmit_frame_ring
     1.04%  [kernel]            [k] lookup
     0.87%  [kernel]            [k] eth_type_trans


no panic on console - rebooting to check logs

Reply via email to