Re: TCP stops sending packets over loopback on 4.10-rc3?

Josef Bacik Wed, 25 Jan 2017 06:27:29 -0800

On Wed, Jan 25, 2017 at 9:14 AM, Josef Bacik <[email protected]> wrote:

On Tue, Jan 24, 2017 at 9:07 AM, Eric Dumazet<[email protected]> wrote:
On Tue, 2017-01-24 at 06:20 -0500, Josef Bacik wrote:
 Hello,
 I've been trying to test some NBD changes I had made recently and I
 started having packet timeouts.  I traced this down to tcp just
stopping sending packets after a lot of writing. All NBD does iscall
 kernel_sendmsg() with a request struct and some pages when it does
writes. I did a bunch of tracing and I've narrowed it down torunning
 out of sk_wmem_queued space.  In tcp_sendmsg() here

 new_segment:
/* Allocate new segment. If the interfaceis SG,
                          * allocate skb fitting to single page.
                          */
                         if (!sk_stream_memory_free(sk))
                                 goto wait_for_sndbuf;

 we hit this pretty regularly, and eventually just get stuck in
 sk_stream_wait_memory until the timeout ends and we error out
everything. Now sk_stream_memory_free checks the sk_wmem_queuedandcalls into the sk_prot->stream_memory_free(), so I broke this outlike
 the following


     if (sk->sk_wmem_queued >= sk->sk_sndbuf) {
         trace_printk("sk_wmem_queued %d, sk_sndbuf %d\n",
 sk->sk_wmem_queued, sk->sk_sndbuf);
         goto wait_for_sndbuf;
      }
      if (sk->sk_prot->stream_memory_free &&
 !sk->sk_prot->stream_memory_free(sk)) {
         trace_printk("sk_stream_memory_free\n");
         goto wait_for_sndbuf;
      }

 And I got this in my tracing

    kworker/u16:5-112   [001] ....  1375.637564: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [001] ....  1375.639657: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [003] ....  1375.641128: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [003] ....  1375.643441: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [001] ....  1375.807614: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [001] ....  1377.538744: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [001] ....  1377.543418: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
     kworker/2:4H-1535  [002] ....  1377.544685: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
    kworker/u16:5-112   [000] ....  1379.378352: tcp_sendmsg:
 sk_wmem_queued 4205796, sk_sndbuf 4194304
    kworker/u16:5-112   [003] ....  1380.985721: tcp_sendmsg:
 sk_wmem_queued 4212416, sk_sndbuf 4194304
This is as far as I've gotten and I'll keep digging into it, but Iwas
 wondering if this looks familiar to anybody?  Also one thing I've
 noticed is sk_stream_wait_memory() will wait on sk_sleep(sk), but
basically nothing wakes this up. For example it seems the mainway we
 reduce sk_wmem_queued is through sk_wmem_free_skb(), which doesn't
appear to wake anything up in any of its callers, so anybody whodoes
 end up sleeping will basically never wake up.  That seems like it
should be more broken than it is, so I'm curious to know howthings are
 actually woken up in this case.  Thanks,
git grep -n SOCK_QUEUE_SHRUNK

-> tcp_check_space()
But tcp_check_space() doesn't actually reduce sk_wmem_queued fromwhat I can see. The only places that appear to reduce it aretcp_trim_head, which is only called in the retransmit path, andsk_wmem_free_skb, which seems to be right, but I added atrace_printk() in it to see if it was firing during my test and itnever fires. So we _appear_ to only ever be incrementing thiscounter, but never decrementing it. I'm doing a bunch of tracingtrying to figure out what is going on here but so far nothing ispopping which is starting to make me think ftrace is broken. Thanks,

Nope ftrace isn't broken, I'm just dumb, the space is being reclaimedby sk_wmem_free_skb(). So I guess I need to figure out why I stopgetting ACK's from the other side of the loopback. Thanks,


Josef

Re: TCP stops sending packets over loopback on 4.10-rc3?

Reply via email to