We have isolated a problem where the TCP4 driver fails to acknowledge received
data under certain circumstances.
Background on DPCs: The DPC mechanism allows functions to be called at a later
point in time at a requested TPL level. Functions are queued through the DPC
Protocol's QueueDpc interface. DPCs are dispatched anytime a module calls the
DispatchDpc function in the DPC protocol. The dispatch process will execute
all queued DPCs that were registered to execute at or above the caller's TPL
level (e.g. if caller is at TPL_CALLBACK the DPC dispatch will execute anything
between CALLBACK and HIGH_LEVEL).
The stack depends on DispatchDpc being called at appropriate preemption points
to advance packet processing. The dispatch function is called in multiple
layers as you can see by searching for DispatchDpc with calls originating from
ArpDxe, Ip4Dxe, MnpDxe, Udp4Dxe, UefiPxeBcDxe, DnsDxe, HttpDxe, Ip6Dxe and
Udp6Dxe.
Currently it's possible for DPC dispatching to occur in a nested manner.
Imagine a case where an upper network stack layer queues a DPC (for example the
TCP layer's TcpTickingDpc) and in the DPC handler it makes use of lower layers
(like sending a packet through IP+MNP). As part of packet processing these
lower layers will call DispatchDpc resulting in nested DPCs execution.
Here's an example of the DPC nesting, the indent level indicates the level of
nesting for the DPC
TcpTicking
DpcDispatchDpc TPL=CALLBACK
TcpTickingDpc
TcpTickingDpc Tcb: 0x49a42dd0, DelayedAck=1, CtrlFlag: 0x1019
TcpTickingDpc call TcpSendAck delayed is: 1
TcpSendAck Seq=158464588 Ack=4152002304
TcpTransmitSegment DelayedAck = 0
Nbuf SendIpPacket: DestPort=62462 Seq=158464588 Ack=4152002304 Window=19840
MnpSyncSendPacket call DispatchDpc
DpcDispatchDpc TPL=CALLBACK
Ip4AccpetFrame call DispatchDpc
DpcDispatchDpc TPL=CALLBACK
Ip4FreeTxToken call DispatchDpc
DpcDispatchDpc TPL=CALLBACK
TcpInput: DestPort=8816 Seq=4152002304 Ack=158464588 Len=1460
TcpClearTimer Tcb: 0x49a42dd0
Notice how the process of sending the TCP ACK via IP then MNP causes another
DispatchDpc to occur before the TCP segment transmit call returns. This
nesting continues on and a whole bunch of code has executed (all at CALLBACK
TPL). You can see near the end that we even begin processing another TCP
packet.
If we look in detail what the TcpSendAck does there are two key steps:
if (TcpTransmitSegment (Tcb, Nbuf) == 0) {
TCP_CLEAR_FLG (Tcb->CtrlFlag, TCP_CTRL_ACK_NOW);
Tcb->DelayedAck = 0;
}
It transmits the segment and after the transmit returns it clears the
DelayedAck counter since the presumption is that we sent an ACK for the most
recent segment and we are caught up. But because DPCs are dispatched within
TcpTransmitSegment the assumption that this transmit was the last one is
incorrect.
Here is a portion of a trace illustrating the problem:
TcpTicking
DpcDispatchDpc TPL=CALLBACK
TcpTickingDpc
TcpTickingDpc Tcb: 0x49a42dd0, DelayedAck=1, CtrlFlag: 0x1019
TcpTickingDpc call TcpSendAck delayed is: 1
TcpSndAck Seq=158464588 Ack=4152002304
TcpTransmitSegment DelayedAck = 0
Nbuf SendIpPacket: DestPort=62462 Seq=158464588 Ack=4152002304 Window=19840
MnpSyncSendPacket call DispatchDpc
DpcDispatchDpc TPL=CALLBACK
[snip - a bunch of nested DPC processing removed]
DpcDispatchDpc Tpl= TPL=CALLBACK
TcpInput: DestPort=8816 Seq=4152019824 Ack=158464588 Len=1460
TcpClearTimer Tcb: 0x49a42dd0
TcpInput Seq=4152019824 Tcb: 0x49a42dd0, Tcb->DupAck = 0
TcpToSendAck add to delayedack Seq=158464588 Ack=4152021284
TcpToSendAck add to delayedack Tcb: 0x49a42dd0, Seq=158464588
Ack=4152021284, DelayedAck=1
^^ NOTE: the DelayedAck flag has been set to one indicating that we
haven't acknowledged yet and need to soon
[we return from 14 nested DPC calls !!]
TcpSndAck Tcb->DelayedAck = 0
^^ But when the Dispatch returns from the TcpTransmitSegment we clear
DelayedAck back to zero such that we never acknowledge the last packet we
received.
TcpTickingDpc No timer active or expired
TcpTickingDpc Tcb: 0x49918bd0, DelayedAck=0, CtrlFlag: 0x1019
TcpTickingDpc No timer active or expired
In cases where TCP is waiting for more data from the remote endpoint and the
endpoint is waiting for acknowledgement (like a situation where the sliding
window is small or closed) the remote endpoint will retransmit the last packet
but for reasons I don't yet understand duplicate acks aren't sent (I think
there's another bug around retransmitting acks but I haven't isolated it).
The result is that the connection is reset by the remote endpoint after a
timeout period elapses.
This seems kind of like a critical section / locking problem around the
DelayedAck member. I'm not sure if the issue stems from the fact that the DPC
model allows a form of preemption at the same TPL level (in this case we have
TCP4 at TPL_CALLBACK effectively preempting TCP4 at TPL_CALLBACK) or if the
issue is that the TCP4 driver is not managing the DelayedAck variable correctly
given the rules for DPC routines (btw: what are the rules?).
I'd like an explanation for what the expected DPC preemption model is - are we
really supposed to be nesting dispatch calls and are all network components
expected to handle this level or reentrancy?
We are testing a change to remove DPC nesting which we believe should resolve
this issue. We would be happy to provide the patch if there's agreement that
this is the right way to resolve this.
I'd be open to a phone call / Lync meeting with the maintainers if you feel it
would be helpful.
Thanks,
Eugene
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel