[jira] [Commented] (PROTON-1791) TCP sockets remain open in CLOSE_WAIT state

Miha Plesko (JIRA) Wed, 14 Mar 2018 11:59:17 -0700

    [ 
https://issues.apache.org/jira/browse/PROTON-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399116#comment-16399116
 ]


Miha Plesko commented on PROTON-1791:
-------------------------------------

Thanks [~aconway], will try your patch tomorrow when I return to my office, but 
I think it should work since it's the same as my current workaround (see 
previous comment). Let me quickly explain how we're able to reproduce the 
problem. So we're being disconnected from ActiveMQ because of 
"amqp:resource-limit-exceeded: local-idle-timeout expired" error every 2 
minutes or so and we create a new container immediately on error (in a loop). 
Then the sockets get stucked as described above.

Q: Do you happen to know why ActiveMQ would be sending us "local-idle-timeout 
expired" error every 2 minutes? Does qpid_proton gem not send heartbeats and 
ActiveMQ thinks we're dead? Looking at this comment here 
https://github.com/apache/qpid-proton/blob/master/proton-c/bindings/ruby/lib/core/container.rb#L271
 the explanation seems quite possible. 

Q2: Assuming the core problem is that qpid_proton isn't sending heartbeat, is 
there an easy workaround for this? Would manually opening a sender and send 
some dummy message to the ActiveMQ help?

I thank you in advance, we're in great hurry for this issue to be resolved 
(this week) so I kindly ask for a quick reply.


Regards,
Miha

> TCP sockets remain open in CLOSE_WAIT state
> -------------------------------------------
>
>                 Key: PROTON-1791
>                 URL: https://issues.apache.org/jira/browse/PROTON-1791
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: ruby-binding
>    Affects Versions: proton-c-0.21.0
>         Environment: Confirmed on Ubuntu 16.04 and RHEL 7.4
> Confirmed on qpid_proton 0.19.0 and 0.21.0
>            Reporter: Miha Plesko
>            Assignee: Alan Conway
>            Priority: Major
>              Labels: bug
>             Fix For: proton-c-0.22.0
>
>
> Hi guys,
> thanks for developing the awesome qpid_proton ruby gem, we're using it on 
> daily basis!
> However, recently we noticed following error in our server log:
> Too many open files - socket(2) for "172.16.117.189" port 5672
> After some research it turns out that qpid_proton process is having 
> increasingly
> more and more following file descriptors open:
> $ lsof -ap 108533
> ruby    108533 miha  116u  IPv4             562438      0t0      TCP 
> 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  197u  IPv4             561644      0t0      TCP 
> 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  311u  IPv4             560657      0t0      TCP 
> 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  549u  IPv4             565342      0t0      TCP 
> 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  576u  IPv4             565122      0t0      TCP 
> 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  603u  IPv4             565738      0t0      TCP 
> 172.16.117.189:53654->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  630u  IPv4             563021      0t0      TCP 
> 172.16.117.189:53658->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  657u  IPv4             568361      0t0      TCP 
> 172.16.117.189:53662->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  666u  IPv4             563027      0t0      TCP 
> 172.16.117.189:53666->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  675u  IPv4             567538      0t0      TCP 
> 172.16.117.189:53670->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  684u  IPv4             567998      0t0      TCP 
> 172.16.117.189:53678->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  690u  IPv4             574709      0t0      TCP 
> 172.16.117.189:53686->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  693u  IPv4             578725      0t0      TCP 
> 172.16.117.189:53694->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  696u  IPv4             576840      0t0      TCP 
> 172.16.117.189:53698->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  699u  IPv4             577819      0t0      TCP 
> 172.16.117.189:53702->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  702u  IPv4             582192      0t0      TCP 
> 172.16.117.189:53710->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  705u  IPv4             582861      0t0      TCP 
> 172.16.117.189:53714->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  708u  IPv4             577363      0t0      TCP 
> 172.16.117.189:53718->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  711u  IPv4             578175      0t0      TCP 
> 172.16.117.189:53722->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  714u  IPv4             587172      0t0      TCP 
> 172.16.117.189:53730->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  717u  IPv4             584387      0t0      TCP 
> 172.16.117.189:53734->147.75.102.132:amqp (CLOSE_WAIT)
> ...
> I think the CLOSE_WAIT status of file descriptor indicates that the TCP
> connection has already been closed, but the file descriptor wasn't closed.
> After 9 hours or so there are enough of such file descriptors for OS to
> complain about it.
> We did all we could to close connections gracefully:
> connection.container.stop
> connection.close
> connection = nil
> but nothing seems to help. A simple but expensive workaround is to manually 
> invoke Ruby's garbage collection,
> but ideally `connection.close` would close the file descriptor.
> May I kindly ask you to look at this?
> Thank you and Best Regards,
> Miha
> PS: The error occurs both on Ubuntu 16.04 and RHEL 7.4
> PS2: The error occurs both on qpid_proton 0.19.0 and 0.21.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

[jira] [Commented] (PROTON-1791) TCP sockets remain open in CLOSE_WAIT state

Reply via email to