[
https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346797#comment-15346797
]
Michael Hrivnak edited comment on QPID-7317 at 6/23/16 5:23 PM:
----------------------------------------------------------------
The process in question did not produce any log statements from or related to
qpid.
This is what I see from strace:
strace -p 21739
Process 21739 attached
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
Note that for a healthy child worker process, I do not see that polling happen.
I think thread 3, the one the attached backtrace is from, is stuck in some kind
of loop polling for some condition on FD 19 every 3 seconds that is never going
to happen.
FD 19 appears to be a FIFO pipe. I will attach lsof output separately.
# ls -l /proc/21739/fd/19
lr-x------. 1 apache apache 64 Jun 21 13:45 /proc/21739/fd/19 -> pipe:[152836]
was (Author: mhrivnak):
The process in question did not produce any log statements from or related to
qpid.
This is what I see from strace:
# strace -p 21739
Process 21739 attached
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout)
Note that for a healthy child worker process, I do not see that polling happen.
I think thread 3, the one the attached backtrace is from, is stuck in some kind
of loop polling for some condition on FD 19 every 3 seconds that is never going
to happen.
FD 19 appears to be a FIFO pipe. I will attach lsof output separately.
# ls -l /proc/21739/fd/19
lr-x------. 1 apache apache 64 Jun 21 13:45 /proc/21739/fd/19 -> pipe:[152836]
> Deadlock on publish
> -------------------
>
> Key: QPID-7317
> URL: https://issues.apache.org/jira/browse/QPID-7317
> Project: Qpid
> Issue Type: Bug
> Components: Python Client
> Affects Versions: 0.32
> Environment: python-qpid-0.32-13.fc23.noarch
> Reporter: Brian Bouterse
> Attachments: bt.txt, lsof.txt
>
>
> When publishing a task with qpid.messaging it deadlocks and our application
> cannot continue. This has not been a problem for several releases, but within
> a few days recently, another Satellite developer and I both experienced the
> issue on separate machines, different distros. He is using a MRG built
> pacakge (not sure of version). I am using python-qpid-0.32-13.fc23.
> Both deadlocked machines had core dumps taken on the deadlocked processes and
> only show only 1 Qpid thread when I expect there to be 2. There are other
> mongo threads, but those are idle as expected and not related. The traces
> show our application calling into qpid.messaging to publish a message to the
> message bus.
> This problem happens intermittently, and in cases where message publish is
> successful I've verified by core dump that there are the expected 2 threads
> for Qpid.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]