-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50142/#review142581
-----------------------------------------------------------


Ship it!




Ship It!

- Ted Ross


On July 18, 2016, 12:04 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50142/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 12:04 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Ted Ross.
> 
> 
> Repository: qpid-cpp
> 
> 
> Description
> -------
> 
> Fixed bug in backup recovery consistent with stack traces from a customer and
> similar traces reproduced as explained in
>     https://bugzilla.redhat.com/show_bug.cgi?id=1333767#c19.
> 
> The bug was in BrokerReplicator: an interrupted failover left the 
> UpdateTracker
> in place with a queue list. The next failover created a new UpdateTracker and
> deleted the old one, which triggered deletion of queues. This caused 
> re-entrant
> deletion of bridges while Link::ioThreadProcessing is iterating over the 
> bridge
> list, which could cause the iteration to go past the end of the list and 
> access
> invalid memory.
> 
> The exact circumstances are hard to reproduce, it requires multiple steps:
> 
> 1. Backup receives queue creation events and calls to Link::add() with new
> bridges, but is disconnected from primary before the 
> Link::ioThreadProcessing()
> event. This puts the new bridges on the Link::created list *before* the
> BrokerReplicator bridge.
> 
> 2. Backup re-connects and creates an UpdateTracker with a list of all queues.
> Before the update is complete, the backup is disconnected leaving the
> UpdateTracker in place.
> 
> 3. Backup re-connects again and deletes the old UpdateTracker, which deletes 
> all
> the queues and their bridges in the Link::created vector. This means the 
> iterator
> in Link::ioThreadProcessing now points after the end of the vector.
> 
> Step 1 is needed because normally the BrokerReplicator is the first bridge in
> the list, and deleting queues after it leaves the iterator inside bounds. 
> There
> needs to be at least one deleted queue before the BrokerReplicator in the 
> list.
> 
> There are a number of reasons a failover could be interrupted:
> 
> - two primaries are killed in quick succession.
> - primary disconnects an expected "ready" backup at the ha-backup-timeout.
> - a backup misses a heartbeat during failover.
> 
> 
> Diffs
> -----
> 
>   src/qpid/ha/BrokerReplicator.cpp d664f13893a075dae459fbc8d50bbb44751832ba 
> 
> Diff: https://reviews.apache.org/r/50142/diff/
> 
> 
> Testing
> -------
> 
> Passes qpid ctest suite. No auto regression test, not able to reliabily 
> reproduce. Long run stress tests have not shown the bug since the patch.
> 
> 
> Thanks,
> 
> Alan Conway
> 
>

Reply via email to