We have isolated a problem where the TCP4 driver fails to acknowledge received 
data under certain circumstances.

Background on DPCs: The DPC mechanism allows functions to be called at a later 
point in time at a requested TPL level.  Functions are queued through the DPC 
Protocol's QueueDpc interface.  DPCs are dispatched anytime a module calls the 
DispatchDpc function in the DPC protocol.  The dispatch process will execute 
all queued DPCs that were registered to execute at or above the caller's TPL 
level (e.g. if caller is at TPL_CALLBACK the DPC dispatch will execute anything 
between CALLBACK and HIGH_LEVEL).

The stack depends on DispatchDpc being called at appropriate preemption points 
to advance packet processing.  The dispatch function is called in multiple 
layers as you can see by searching for DispatchDpc with calls originating from 
ArpDxe, Ip4Dxe, MnpDxe, Udp4Dxe, UefiPxeBcDxe, DnsDxe, HttpDxe, Ip6Dxe and 
Udp6Dxe.

Currently it's possible for DPC dispatching to occur in a nested manner.  
Imagine a case where an upper network stack layer queues a DPC (for example the 
TCP layer's TcpTickingDpc) and in the DPC handler it makes use of lower layers 
(like sending a packet through IP+MNP).  As part of packet processing these 
lower layers will call DispatchDpc resulting in nested DPCs execution.


Here's an example of the DPC nesting, the indent level indicates the level of 
nesting for the DPC

TcpTicking
DpcDispatchDpc TPL=CALLBACK
    TcpTickingDpc
    TcpTickingDpc Tcb: 0x49a42dd0, DelayedAck=1, CtrlFlag: 0x1019
    TcpTickingDpc call TcpSendAck delayed is: 1
    TcpSendAck  Seq=158464588 Ack=4152002304
    TcpTransmitSegment DelayedAck = 0
    Nbuf SendIpPacket: DestPort=62462 Seq=158464588 Ack=4152002304 Window=19840
    MnpSyncSendPacket call DispatchDpc
    DpcDispatchDpc TPL=CALLBACK
        Ip4AccpetFrame call DispatchDpc
        DpcDispatchDpc TPL=CALLBACK
            Ip4FreeTxToken call DispatchDpc
            DpcDispatchDpc TPL=CALLBACK
                TcpInput: DestPort=8816 Seq=4152002304 Ack=158464588 Len=1460
                TcpClearTimer Tcb: 0x49a42dd0

Notice how the process of sending the TCP ACK via IP then MNP causes another 
DispatchDpc to occur before the TCP segment transmit call returns.  This 
nesting continues on and a whole bunch of code has executed (all at CALLBACK 
TPL).  You can see near the end that we even begin processing another TCP 
packet.


If we look in detail what the TcpSendAck does there are two key steps:

  if (TcpTransmitSegment (Tcb, Nbuf) == 0) {
    TCP_CLEAR_FLG (Tcb->CtrlFlag, TCP_CTRL_ACK_NOW);
    Tcb->DelayedAck = 0;
  }

It transmits the segment and after the transmit returns it clears the 
DelayedAck counter since the presumption is that we sent an ACK for the most 
recent segment and we are caught up.  But because DPCs are dispatched within 
TcpTransmitSegment the assumption that this transmit was the last one is 
incorrect.


Here is a portion of a trace illustrating the problem:

TcpTicking
DpcDispatchDpc TPL=CALLBACK
    TcpTickingDpc
    TcpTickingDpc Tcb: 0x49a42dd0, DelayedAck=1, CtrlFlag: 0x1019
    TcpTickingDpc call TcpSendAck delayed is: 1
    TcpSndAck  Seq=158464588 Ack=4152002304
    TcpTransmitSegment DelayedAck = 0
    Nbuf SendIpPacket: DestPort=62462 Seq=158464588 Ack=4152002304 Window=19840
    MnpSyncSendPacket call DispatchDpc
    DpcDispatchDpc TPL=CALLBACK

        [snip - a bunch of nested DPC processing removed]

         DpcDispatchDpc Tpl= TPL=CALLBACK
          TcpInput: DestPort=8816 Seq=4152019824 Ack=158464588 Len=1460
          TcpClearTimer Tcb: 0x49a42dd0
          TcpInput Seq=4152019824 Tcb: 0x49a42dd0, Tcb->DupAck = 0
          TcpToSendAck add to delayedack Seq=158464588 Ack=4152021284
          TcpToSendAck add to delayedack Tcb: 0x49a42dd0, Seq=158464588 
Ack=4152021284, DelayedAck=1
          ^^ NOTE: the DelayedAck flag has been set to one indicating that we 
haven't acknowledged yet and need to soon

      [we return from 14 nested DPC calls !!]

    TcpSndAck  Tcb->DelayedAck = 0
    ^^ But when the Dispatch returns from the TcpTransmitSegment we clear 
DelayedAck back to zero such that we never acknowledge the last packet we 
received.

    TcpTickingDpc No timer active or expired
    TcpTickingDpc Tcb: 0x49918bd0, DelayedAck=0, CtrlFlag: 0x1019
    TcpTickingDpc No timer active or expired

In cases where TCP is waiting for more data from the remote endpoint and the 
endpoint is waiting for acknowledgement (like a situation where the sliding 
window is small or closed) the remote endpoint will retransmit the last packet 
but for reasons I don't yet understand duplicate acks aren't sent (I think 
there's another bug around retransmitting acks but I haven't isolated it).   
The result is that the connection is reset by the remote endpoint after a 
timeout period elapses.


This seems kind of like a critical section / locking problem around the 
DelayedAck member.  I'm not sure if the issue stems from the fact that the DPC 
model allows a form of preemption at the same TPL level (in this case we have 
TCP4 at TPL_CALLBACK effectively preempting TCP4 at TPL_CALLBACK) or if the 
issue is that the TCP4 driver is not managing the DelayedAck variable correctly 
given the rules for DPC routines (btw: what are the rules?).


I'd like an explanation for what the expected DPC preemption model is - are we 
really supposed to be nesting dispatch calls and are all network components 
expected to handle this level or reentrancy?

We are testing a change to remove DPC nesting which we believe should resolve 
this issue.  We would be happy to provide the patch if there's agreement that 
this is the right way to resolve this.

I'd be open to a phone call / Lync meeting with the maintainers if you feel it 
would be helpful.

Thanks,

Eugene

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to