On Thu, Jun 9, 2016 at 3:11 PM, Michael Brown <mc...@ipxe.org> wrote: > On 09/06/16 13:44, Ladi Prosek wrote: >>> >>> Do you know what prevents the usual TCP retransmission mechanism from >>> recovering? ARP discovery should still work even for retransmitted >>> packets. >> >> >> Just like you wrote in the ipxe-devel thread linked from the commit >> description, from the client point of view the connection is "stable". >> Everything the client has sent has been acked so the retransmission >> timer is not running. The server is retransmitting for sure but its >> packets just can't reach the client - they're routed somewhere else or >> are blackholed altogether. > > > Understood that the client (iPXE) will not be retransmitting, but that still > doesn't explain what happens to the server's retransmitted packets. > >> I can get to this state easily by configuring my virtual NIC with the >> hardcoded default MAC. There are more such hosts on the network >> claiming the same MAC so sooner or later I find myself cut off. > > > OK, but in that situation we don't expect traffic to get through anyway; > it's a broken setup.
Yes, there definitely has to be something broken about the network setup for this to make a difference. Unfortunately, and please excuse my pragmatism, that's often the reality and finding/fixing the root cause may be prohibitively hard. > I'm trying to think of a situation in which this situation could arise in a > non-broken setup, to convince myself that this is something we should be > adding. The best I can think of off-hand is where iPXE is behind some kind > of NAT, and the NATting device has lost track of the relevant state. NAT losing state is definitely one plausible case. Another could be some kind of a multi-path setup where failover has just happened and the new path is unaware of the connection. Or a virtual machine that has just been migrated to another part of the network and the infrastructure is still learning its new location, whatever that means :-p Hosts that have just come up and are booting could face all kinds of network instability problems. That's all I can offer in terms of supporting arguments. I know that it's been tried and it works. But I also know that there's no RFC to refer to, it's a grey territory at best. >> That sounds good. Under certain circumstances this may generate >> otherwise unnecessary traffic so I just want to be careful. For >> example if it's an HTTP connection and it is kept alive (as in HTTP >> keepalive), it will look idle and will be pinging the server with >> keepalives periodically even though it's not waiting for anything. Big >> deal? Probably not. Worth adding a way for upper layers to signal this >> down to the TCP implementation? Probably not either. > > > I think we should keep it as simple as possible. Always send keepalives on > any established connection, use start_timer_fixed() with some period long > enough to not be disruptive to real traffic (e.g. 15 seconds), reset the > timer whenever any packet is received on that connection. > > It might also be desirable to use the common transmit path to send the > keepalive packet, if that can cleanly be made to result in smaller and > simpler code. I don't think we need to use the (seq-1) trick since we're > not aiming to elicit a response unless the remote end genuinely has > something it's already trying to send. We should be able to just send an > unsolicited pure ACK, which the existing transmit path can already create > via the TCP_ACK_PENDING flag. I'm all for making it as simple as possible. Unifying it with the regular transmit path sounds great. The seq-1 trick could make it marginally easier to follow what's going on when looking at the traffic with a network analyzer but it's certainly not required. Should I prepare a simplified v2 of the patch? Thanks, Ladi _______________________________________________ ipxe-devel mailing list ipxe-devel@lists.ipxe.org https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel