Hi, On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel <netdev-l...@genoetigt.de> wrote: > Sorry no progress until now, I currently do not get time to have a deeper look > into that. We're back to 4.1.6 right now.
Thanks for letting me know. In the project I am currently involved in, we unfortunately don't have the option of reverting the kernel, so we are finding ways to live with the error. We have been looking into the error a bit more, and have made the following observations: * First of all, as discussed earlier in the thread, the error is triggered by dst_orig being NULL. Our current work-around is just to return from xfrm_lookup if dst_orig is NULL and this seems to work fine, the error doesn't happen that often (in our use-cases at least). * The machine we use for testing (and where we first saw the error) is used as initiator. * When we compare the logs from Strongswan with the ones from the kernel, it seems that the error is typically triggered when a tunnels is teared down/about to come up. We need quite a lot of tunnels for the error to trigger, usually around 30+. I guess this might point to some race or some condition not being met when packets are sent/received. * We see the error much more frequently when hardware encryption is enabled. * Yesterday, we upgraded the kernel from 4.14.34 to 4.14.48, and the error happens much less frequently. I see that 4.14.48 includes several IPsec fixes (for example the previously mentioned ("xfrm: Fix a race in the xdst pcpu cache.")). BR, Kristian