Chuck Rolke created DISPATCH-1136:
-------------------------------------

             Summary: Receiver crash due to data corruption on multicast 
presettled messages
                 Key: DISPATCH-1136
                 URL: https://issues.apache.org/jira/browse/DISPATCH-1136
             Project: Qpid Dispatch
          Issue Type: Bug
          Components: Routing Engine
    Affects Versions: 1.3.0
         Environment: Fedora 27

Three routers connected serially as described in DISPATCH-1124

 
            Reporter: Chuck Rolke
            Assignee: Chuck Rolke


After applying the fixes from DISPATCH-1124 and DISPATCH-1129 receivers in 
long-running multicast presettled tests still fail with corrupted data 
sequences. There is no single symptom but several:
 * Receivers use all system memory and cache and getting hit by the OOM killer
 * underrun
 * illegal value for field

Research shows that function qdr_forward_drop_presettled_CT_LH is routinely 
dropping presettled deliveries that have already made forward progress in 
transmitting bytes to the wire. After that happens there is a race condition as 
to whether the message is successfully transmitted or the message is torn down 
in the middle of transmission.

For reproducing this error the sender must supply messages significantly faster 
than the receiving router can forward them to the next router. This triggers 
the presettled drops. My test setup does this by having the sender and the 
receiving router on the same laptop and having the next router connected over a 
relatively slow WiFi.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to