We have been hitting this bug quite often while running Tomcat 8.5 on
Amazon AWS Linux 2 with a kernel of 4.14.268-205.500.amzn2.x86_64

I wanted to see if the bug could be reproduced using an updated kernel
so I attempted to repro it using the server code and methodology
provided by Mark Thomas on Ubuntu Server 21.10 (running on a Raspberry
Pi 4 with 4GB RAM) and was NOT able to repro the bug (kernel
5.13.0-1008-raspi). I then installed Ubuntu Server 20.04 LTS on the same
machine and WAS able to repro the bug (kernel 5.4.0-1052-raspi).  The
bug was fairly easy to repro and did not take multiple times to repro.

Since then I have been able to repro the bug using the server code on
AWS Linux 2 with the 4.14.268-205.500.amzn2.x86_64 kernel, but not on
AWS Linux 2 with a 5.10.109-104.500.amzn2.x86_64 kernel.

I think there is a slight problem with the server code used in the
repro, as it is calling `pthread_create` with no thread attributes,
which will create joinable threads instead of detached threads. The
documentation for `pthread_create` says that "Only when a terminated
joinable thread has been joined are the last of its resources released
back to the system." Because the server code never joins the threads I
think this is preventing the OS from releasing the thread resources.
This results in the server eventually running out of memory and the
server program returning a "pthread_create: Cannot allocate memory" as
mentioned by Brooke Hedrick in their comment.  I was also not able to
repro the bug on WSL (kernel 4.4.0-19041-Microsoft), but perhaps their
underlying network drivers are different?

I also was running into this issue when running the server code. I made
a slight modification to the server code to set the pthread attribute to
create the new threads in a detached state. This seemed to solve the
memory issue and I was able to repro the bug with this server.  I've
attached the code.

Additionally, I found it useful to use `prlimit` to update the maximum
number of open files for the server process, once it was running. This
made the server less likely to run into an EMFILE error when calling
`accept`.


** Attachment added: "Updated server to demonstrate bug"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1924298/+attachment/5582247/+files/server.c

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1924298

Title:
  accept returns duplicate endpoints under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1924298/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to