[ https://issues.apache.org/jira/browse/PROTON-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301675#comment-16301675 ]
ASF subversion and git services commented on PROTON-1727: --------------------------------------------------------- Commit 223e6d012dab8bbe4bdb92538d84c630bbd1cf27 in qpid-proton's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=223e6d0 ] PROTON-1727 [epoll] fix race condition Needed to set current_arm flag on second and subsequent connect attempts when resolve returns multiple socket addresses. Otherwise another thread can delete the connection early. > [epoll proactor] segfaults, hangs and leaked FDs around failed connect > ---------------------------------------------------------------------- > > Key: PROTON-1727 > URL: https://issues.apache.org/jira/browse/PROTON-1727 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c > Affects Versions: proton-c-0.18.1 > Reporter: Alan Conway > Assignee: Alan Conway > Priority: Blocker > Fix For: proton-c-0.20.0 > > > There is a race condition that causes leaked FDs and segfaults in the epoll > proactor under the following conditions: > - there is more than one thread processing proactor events. > - attempting to connect to a host address that resolves to multiple socket > addresses, e.g. resolving the NULL hostname on a machine with ipv4 and ipv6 > enabled. > - there is nothing listening on the target port. > The attached reproducer shows several bad behaviors: > - under rr or valgrind (--tool=memcheck and --tool=helgrind) it quickly (< > 1min) shows race conditions and/or invalid memory access. > - it hangs fairly often even without valgrind/rr, more so if you increase the > thread count. Without valgrind/rr it rarely segfaults. > - it leaks FDs - the test should run forever, but runs out of FDs around 1024 > iterations. > This is probably the cause of > https://issues.apache.org/jira/browse/DISPATCH-902, which does occur very > frequently under the conditions described there. > The test program should run forever without leaking or showing any faults. > Note that gcc -fsantize does not detect races or memory errors, which > suggests the bug requires a delay at the right time to manifest. Valgrind's > overhead and rr's code serialization appears to provide that delay. It seems > likely that dispatch's reconnect logic is providing the delay in DISPATCH-902. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org