Re: panic with tcp timers

Bjoern A. Zeeb Fri, 17 Jun 2016 07:49:07 -0700

On 17 Jun 2016, at 4:53, Gleb Smirnoff wrote:

Hi!


  At Netflix we are observing a race in TCP timers with head.
The problem is a regression, that doesn't happen on stable/10.
The panic usually happens after several hours at 55 Gbit/s of
traffic.

What happens is that tcp_timer_keep finds t_tcpcb being
NULL. Some coredumps have tcpcb already initialized,
with non-NULL t_tcpcb and in TCPS_ESTABLISHED state. Which
means that other CPU was working on the tcpcb while
the faulted one was working on the panic. So, this all looks
like a use after free, which conflicts with new allocation.

Comparing stable/10 and head, I see two changes that could
affect that:

- callout_async_drain
- switch to READ lock for inp info in tcp timers

That's why you are in To, Julien and Hans :)

We continue investigating, and I will keep you updated.
However, any help is welcome. I can share cores.


There’s also the change to no longer mark the zones NO_FREE.

In theory I was convinced at the time that it should not be an issueanymore.

If I had overlooked something or follow-up timer changes invalidatedassumptions then that could also be trouble.

That said, I was not able to get any related panics or log entriesanymore lately (but I am currently slightly behind head with my branch).

We should get the problem fixed however and not try to “paint over”again.


/bz
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[email protected]"

Re: panic with tcp timers

Reply via email to