Hi Holger,
On Fri, Jun 10, 2016 at 04:32:55PM +0200, Holger Just wrote:
> Hi Willy et al.,
>
> > Thank you for this report, it helps. How often does it happen, and/or after
> > how long on average after you start it ? What's your workload ? Do you use
> > SSL, compression, TCP and/or HTTP mode, peers synchronization, etc ?
>
> Yesterday, we upgraded from 1.5.14 to 1.5.18 and now observed exactly
> this issue in production. After rolling back to 1.5.14, it didn't occur
> anymore.
>
> We have mostly http traffic, little TCP with about 100-200 req/s, about
> 2000 concurrent connections over all. About all traffic is SSL
> terminated. We use no peer synchronization and no compression.
>
> An strace on the process reveals this (with most of the calls being
> epoll_wait):
>
> [...]
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {{EPOLLIN, {u32=796, u64=796}}}, 200, 0) = 1
> read(796, "
> \357\275Y\231\275'b\5\216#\33\220\337'\370\312\215sG4\316\275\277y-%\v\v\211\331\342"...,
> 5872) = 1452
> read(796, 0x9fa26ec, 4420) = -1 EAGAIN (Resource
> temporarily unavailable)
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> epoll_wait(0, {}, 200, 0) = 0
> [...]
Thank you for the report. I'll inspect the SSL part just in case I'd
miss something. Don't take risks in your production of course.
Best regards,
Willy