[ https://issues.apache.org/jira/browse/PROTON-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399116#comment-16399116 ]
Miha Plesko commented on PROTON-1791: ------------------------------------- Thanks [~aconway], will try your patch tomorrow when I return to my office, but I think it should work since it's the same as my current workaround (see previous comment). Let me quickly explain how we're able to reproduce the problem. So we're being disconnected from ActiveMQ because of "amqp:resource-limit-exceeded: local-idle-timeout expired" error every 2 minutes or so and we create a new container immediately on error (in a loop). Then the sockets get stucked as described above. Q: Do you happen to know why ActiveMQ would be sending us "local-idle-timeout expired" error every 2 minutes? Does qpid_proton gem not send heartbeats and ActiveMQ thinks we're dead? Looking at this comment here https://github.com/apache/qpid-proton/blob/master/proton-c/bindings/ruby/lib/core/container.rb#L271 the explanation seems quite possible. Q2: Assuming the core problem is that qpid_proton isn't sending heartbeat, is there an easy workaround for this? Would manually opening a sender and send some dummy message to the ActiveMQ help? I thank you in advance, we're in great hurry for this issue to be resolved (this week) so I kindly ask for a quick reply. Regards, Miha > TCP sockets remain open in CLOSE_WAIT state > ------------------------------------------- > > Key: PROTON-1791 > URL: https://issues.apache.org/jira/browse/PROTON-1791 > Project: Qpid Proton > Issue Type: Bug > Components: ruby-binding > Affects Versions: proton-c-0.21.0 > Environment: Confirmed on Ubuntu 16.04 and RHEL 7.4 > Confirmed on qpid_proton 0.19.0 and 0.21.0 > Reporter: Miha Plesko > Assignee: Alan Conway > Priority: Major > Labels: bug > Fix For: proton-c-0.22.0 > > > Hi guys, > thanks for developing the awesome qpid_proton ruby gem, we're using it on > daily basis! > However, recently we noticed following error in our server log: > Too many open files - socket(2) for "172.16.117.189" port 5672 > After some research it turns out that qpid_proton process is having > increasingly > more and more following file descriptors open: > $ lsof -ap 108533 > ruby 108533 miha 116u IPv4 562438 0t0 TCP > 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 197u IPv4 561644 0t0 TCP > 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 311u IPv4 560657 0t0 TCP > 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 549u IPv4 565342 0t0 TCP > 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 576u IPv4 565122 0t0 TCP > 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 603u IPv4 565738 0t0 TCP > 172.16.117.189:53654->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 630u IPv4 563021 0t0 TCP > 172.16.117.189:53658->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 657u IPv4 568361 0t0 TCP > 172.16.117.189:53662->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 666u IPv4 563027 0t0 TCP > 172.16.117.189:53666->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 675u IPv4 567538 0t0 TCP > 172.16.117.189:53670->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 684u IPv4 567998 0t0 TCP > 172.16.117.189:53678->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 690u IPv4 574709 0t0 TCP > 172.16.117.189:53686->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 693u IPv4 578725 0t0 TCP > 172.16.117.189:53694->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 696u IPv4 576840 0t0 TCP > 172.16.117.189:53698->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 699u IPv4 577819 0t0 TCP > 172.16.117.189:53702->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 702u IPv4 582192 0t0 TCP > 172.16.117.189:53710->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 705u IPv4 582861 0t0 TCP > 172.16.117.189:53714->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 708u IPv4 577363 0t0 TCP > 172.16.117.189:53718->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 711u IPv4 578175 0t0 TCP > 172.16.117.189:53722->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 714u IPv4 587172 0t0 TCP > 172.16.117.189:53730->147.75.102.132:amqp (CLOSE_WAIT) > ruby 108533 miha 717u IPv4 584387 0t0 TCP > 172.16.117.189:53734->147.75.102.132:amqp (CLOSE_WAIT) > ... > I think the CLOSE_WAIT status of file descriptor indicates that the TCP > connection has already been closed, but the file descriptor wasn't closed. > After 9 hours or so there are enough of such file descriptors for OS to > complain about it. > We did all we could to close connections gracefully: > connection.container.stop > connection.close > connection = nil > but nothing seems to help. A simple but expensive workaround is to manually > invoke Ruby's garbage collection, > but ideally `connection.close` would close the file descriptor. > May I kindly ask you to look at this? > Thank you and Best Regards, > Miha > PS: The error occurs both on Ubuntu 16.04 and RHEL 7.4 > PS2: The error occurs both on qpid_proton 0.19.0 and 0.21.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org