[ 
https://issues.apache.org/jira/browse/QPID-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Conway updated QPID-5139:
------------------------------

    Description: 
When the client sends a "prepare" command for a transaction, the thread 
handling that command is blocked until all backups have responded with their 
prepare status. This can easily deadlock the broker if there are more 
concurrent transactions than worker threads.

To reproduce you need a way to delay completion of a transaction. The following 
regression test illustrates, it will be added to TransactionTests in 
ha_tests.py as part of the fix:
{noformat}
    def test_tx_block_threads(self):
        """Verify that TXs blocked in commit don't block broker threads."""
        cluster = HaCluster(self, 2, args=["--worker-threads=2"])
        sessions = [cluster[0].connect().session(transactional=True) for i in 
xrange(2)]
        for s in sessions: s.sender("foo;{create:always}").send("foo")
        self.assertEqual(2, len(cluster[1].agent().tx_queues()))
        os.kill(cluster[1].pid, signal.SIGSTOP) # Freeze backup so tx can't 
complete.
        threads = [ Thread(target=s.commit) for s in sessions]
        for t in threads: t.start()
        cluster[0].ready(timeout=1) # Should not block
        os.kill(cluster[1].pid, signal.SIGCONT) # Allow tx to complete.
        for t in threads: t.join()
        c.close()
{noformat}


  was:
When the client sends a "prepare" command for a transaction, the thread 
handling that command is blocked until all backups have responded with their 
prepare status. This can easily deadlock the broker if there are more 
concurrent transactions than worker threads.

To reproduce you need a way to delay completion of a transaction. The following 
regression test illustrates, it will be added to TransactionTests in 
ha_tests.py as part of the fix:

    def test_tx_block_threads(self):
        """Verify that TXs blocked in commit don't block broker threads."""
        cluster = HaCluster(self, 2, args=["--worker-threads=2"])
        sessions = [cluster[0].connect().session(transactional=True) for i in 
xrange(2)]
        for s in sessions: s.sender("foo;{create:always}").send("foo")
        self.assertEqual(2, len(cluster[1].agent().tx_queues()))
        os.kill(cluster[1].pid, signal.SIGSTOP) # Freeze backup so tx can't 
complete.
        threads = [ Thread(target=s.commit) for s in sessions]
        for t in threads: t.start()
        cluster[0].ready(timeout=1) # Should not block
        os.kill(cluster[1].pid, signal.SIGCONT) # Allow tx to complete.
        for t in threads: t.join()
        c.close()




> HA transactions block a thread, can deadlock the broker.
> --------------------------------------------------------
>
>                 Key: QPID-5139
>                 URL: https://issues.apache.org/jira/browse/QPID-5139
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Clustering
>    Affects Versions: 0.24
>            Reporter: Alan Conway
>            Assignee: Alan Conway
>            Priority: Critical
>             Fix For: 0.25
>
>
> When the client sends a "prepare" command for a transaction, the thread 
> handling that command is blocked until all backups have responded with their 
> prepare status. This can easily deadlock the broker if there are more 
> concurrent transactions than worker threads.
> To reproduce you need a way to delay completion of a transaction. The 
> following regression test illustrates, it will be added to TransactionTests 
> in ha_tests.py as part of the fix:
> {noformat}
>     def test_tx_block_threads(self):
>         """Verify that TXs blocked in commit don't block broker threads."""
>         cluster = HaCluster(self, 2, args=["--worker-threads=2"])
>         sessions = [cluster[0].connect().session(transactional=True) for i in 
> xrange(2)]
>         for s in sessions: s.sender("foo;{create:always}").send("foo")
>         self.assertEqual(2, len(cluster[1].agent().tx_queues()))
>         os.kill(cluster[1].pid, signal.SIGSTOP) # Freeze backup so tx can't 
> complete.
>         threads = [ Thread(target=s.commit) for s in sessions]
>         for t in threads: t.start()
>         cluster[0].ready(timeout=1) # Should not block
>         os.kill(cluster[1].pid, signal.SIGCONT) # Allow tx to complete.
>         for t in threads: t.join()
>         c.close()
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to