Whilst trying to debug a probable race causing "module reload app_queue" to
lose a queue when it clashed with offering a call to an agent channel, I found
a potential race condition that still seems to exist in trunk. Unfortunately
this race condition seems to fail towards not setting the queue to dead, rather
than leaving it dead when it isn't, so it doesn't explain our problem.
Nonetheless, I think it needs recording.
mark_dead_and_unfound executes the following with no lock on the queue:
q->dead = 1;
The problem with this is that "dead" is a bit field. In particular it shares a
byte with "wrapped", which is a bit field that does get updated in normal
operation. This means it actually compiles as Load, Or, Store. If an update
of wrapped spans the Store, the Store can get wiped out leaving the value
unchanged. Similarly this code could negate an update of "wrapped".
Note that this failure mode is not dependent on the processor processing memory
accesses out of sequence, so memory fences won't help. Similarly "volatile"
won't help.
The exact bit allocations will vary between our version and the trunk one. The
former definitely shares a byte. Based on bit counting, I would expect the
same for trunk.
--
David Woolley
BTS Holdings Plc
Tel: +44 (0)20 8401 9000 Fax: +44 (0)20 8401 9100
http://www.bts.co.uk
BTS Holdings PLC - Registered office: BTS House, Manor Road, Wallington, SM6
0DD - Registered in England: 1517630
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-dev