[ 
https://issues.apache.org/jira/browse/QPID-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517625#comment-15517625
 ] 

ASF subversion and git services commented on QPID-5637:
-------------------------------------------------------

Commit 037c5738734d8fecb7b7f7e7af4e4f14f9cd3a64 in qpid-python's branch 
refs/heads/master from [~aconway]
[ https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;h=037c573 ]

QPID-7317: Fix hangs in qpid.messaging.

Hang is observed in processes using qpid.messaging with a thread blocked waiting
for the Selector to wake it, but no Selector.run thread.

This patch removes all the known ways that this hang can occur. Either we
function normally or immediately raise an exception and log to the
"qpid.messaging" logger a message starting with "qpid.messaging:"

The following issues are fixed:

1. The Selector.run() thread raises a fatal exception.

Use of qpid.messaging will re-raise the exception immediately, not hang.

2. The process forks, so child has no Selector thread.

https://issues.apache.org/jira/browse/QPID-5637 resets the Selector after a 
fork.
In addition we now:

- Close Selector.waiter: its file descriptors are shared with the parent which
  can cause havoc if they "steal" each other's wakeups.

- Replace Endpoint._lock in related endpoints with a BrokenLock. If the parent
  is holding locks when it forks, they remain locked forever in the child.
  BrokenLock.acquire() raises instead of hanging.

3. Selector.stop() called on atexit.

Selector.stop was registered via atexit, which could cause a hang if
qpid.messaging was used in a later-executing atexit function. That has been
removed, Selector.run() is in a daemon thread so there is no need for stop()

4. User calls Selector.stop() directly

There is no reason to do this for the default Selector used by qpid.messaging,
so for that case stop() is now ignored. It works as before for code that creates
its own qpid.Selector instances.


> Python client does not reset the Selector singleton when the process id 
> changes.
> --------------------------------------------------------------------------------
>
>                 Key: QPID-5637
>                 URL: https://issues.apache.org/jira/browse/QPID-5637
>             Project: Qpid
>          Issue Type: Bug
>          Components: Python Client
>    Affects Versions: 0.24, 0.26
>            Reporter: Brian Bouterse
>            Assignee: Ken Giusti
>            Priority: Blocker
>             Fix For: 0.29
>
>         Attachments: celery_worker_output.txt, pid_aware_selector.patch, 
> tasks.py
>
>
> qpid.messaging has an issue with forking in the following situation.
> 1.  A parent Python process imports and uses qpid.messaging to connect to a a 
> Qpid broker
> 2.  The parent process forks a child process
> 3.  The child process imports qpid.messaging and tries to connect to a Qpid 
> broker.
> I expected to see the child process use qpid.messaging normally as it would 
> if it weren't forked in the way described above.  Instead, the server 
> receives the opening of a TCP socket, but the client never sends the AMQP 
> protocol announcement.
> [Forking bring child descriptors with 
> it|http://man7.org/linux/man-pages/man2/fork.2.html].  I expected the file 
> descriptors on the parent and the child to be the same, and to reference the 
> same socket, so I expect qpid.messaging to work without any modification.  
> Surprisingly, it does not.
> There is at least one place where I do understand how this can be avoided.  
> One of the issues is that the file descriptors registered by the Selector 
> object inside of qpid.messaging are stale after the fork.  The Selector 
> object uses a singleton pattern to provide a reference to the same Selector 
> object no matter how many times you call it.  This selector object already 
> has registered file descriptors with the filesystem, which allow the selector 
> to read/write data in an I/O efficient manner.  See the attached 
> [pid_aware_selector.patch] for an example of this.
> The [pid_aware_selector.patch] does allow communication to flow, but  queue 
> creation and deletion sometimes fail in strange ways.  For instance, in the 
> child process, code that creates a queue, reads information about that queue 
> next.  The queue was created, yet the read says that the queue can't be 
> found.  Very strange.  You can see those things fail using the following 
> example:
> 1.  clone our fork of kombu:       `git clone g...@github.com:pulp/kombu.git`
> 2.  Change into the kombu folder     `cd kombu`
> 3.  Switch to the branch containing the qpid code:  `git checkout 
> pulp-dep-3.0.15-with-qpid`
> 4.  Install kombu onto your system or virtualenv (I do it systemwide using 
> sudo):   `sudo python setup.py develop`
> 5.  install celery version 3.1.11.  I do it using pip.    `sudo pip install 
> celery==3.1.11`
> 6.  Install qpid.messaging and qpidtoollibs.  One way I do it is systemwide 
> using pip.      `sudo pip install qpid-tools qpid-python`
> 7.  Start up qpidd (We've been testing with 0.24 and auth off).      `sudo -u 
> qpidd qpidd --auth=no`
> 8.  Put the attached file tasks.py into a directory
> 9.  Open two terminals and change their working directory to be the same as 
> step 8.
> 10.  In one one terminal start the celery worker        `celery worker -A 
> tasks --loglevel=INFO -c1`
> 11.  In the other terminal dispatch 10 tasks             `python tasks.py`
> You should see exceptions raised similar to those in the attached file 
> [celery_worker_output.txt]
> Note, that the code on the pulp-dep-3.0.15-with-qpid branch of kombu monkey 
> patches qpid.messaging with the selector patch referenced above, and also one 
> or two other bugfix patches.  You can see that [monkey patching done 
> here|https://github.com/pulp/kombu/blob/pulp-dep-3.0.15-with-qpid/kombu/transport/qpid.py#L45].
>   This should have no implications on this issue, but I want to be explicit 
> about it.
> A potential fix:  Celery supports a callback after child processes are 
> forked, allowing the call to cleanup/reset exactly these types of things.  I 
> could wire up that callback if such a thing existed on qpid.messaging.  For 
> testing purposes, you could put a call to this cleanup method in the 
> 'sometask' code before the call to controller.inspect().active_queues().  
> This would be similar in timing to a post fork cleanup/reset call.
> Note: the original connection and associated senders/receivers/sessions are 
> still in use by the parent process, so calling close() is not the right thing 
> to do either.  It's like the connection needs to be forgotten, and the file 
> descriptors unregistered from the child process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to