Re: [lwip-users] TCP send() fails when other sockets perform retransmissions

Joel Cunningham Fri, 30 Dec 2016 14:10:15 -0800

> On Dec 30, 2016, at 12:59 PM, Daniel Pauli <[email protected]> wrote:
> 
> Many thanks for your advice! I'll try to check the memory pools when I 
> reproduce the issue next time.
> 
> Further, since it sounds like you initially had sockets configured in 
> blocking mode, when the new socket tries to transmit, it will block trying to 
> allocate TCP segments due to the exhausted memory pool.  The blocking will 
> continue until SO_SNDTIMEOUT is reached or the memory exhaustion is resolved
> 
> To clarify, I ran two tests: In the first, all sockets used the MSG_DONTWAIT 
> flag for send() (non-blocking), in the second no socket used the flag 
> (blocking), so there should be no mixing of blocking/non-blocking from my 
> point of view. I'm not sure if I understand what you mean with "initially 
> configured in blocking mode". Does this mean that send() may still block 
> under certain circumstances (exhausted memory pool) even with MSG_DONTWAIT 
> flag set, so I should initially set the O_NONBLOCK option on the socket to 
> ensure that send() never blocks?
>


I was referring to your original posting where you described seeing blocking 
when sending on a new socket after you had a stale socket in a half-open state 
with submitted data.  I was attempting to explain why that was happening so we 
know there is not erroneous behavior in LwIP

My understanding of BSD socket semantics is that using MSG_DONTWAIT should be 
equivalent to setting the O_NONBLOCK, though you’ll need to include the flag 
for each call rather than set the mode once.

> 
> On Fri, Dec 30, 2016 at 6:30 PM, Joel Cunningham <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Dec 30, 2016, at 10:43 AM, Daniel Pauli <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I'm a little confused about the use of select in your application.  Are you 
>> using it with blocking sockets?
>> 
>> I tested with both blocking and non-blocking send. I observed that 
>> non-blocking send (MSG_DONTWAIT flag set) on sockets determined as 
>> write-ready by select() sometimes returned ENOMEM when "stale sockets" are 
>> around. After applying the patch from 
>> http://lwip.100.n7.nabble.com/bug-49684-lwip-netconn-do-writemore-non-blocking-ERR-MEM-treated-as-failure-td27860.html
>>  
>> <http://lwip.100.n7.nabble.com/bug-49684-lwip-netconn-do-writemore-non-blocking-ERR-MEM-treated-as-failure-td27860.html>,
>>  I got EWOULDBLOCK errors instead.
>> 
> 
> Thanks for including this information. The ENOMEM gives me a good clue of 
> what’s most likely going on.  My guess is that you’re experiencing a memory 
> pool exhaustion and the stale socket has claimed memory from a pool for the 
> segments which are queued for transmit.  Since those segments are not being 
> ACKed in the half open state, the claimed memory won’t be available until the 
> segments are freed (happens during transmission timeout or when socket is 
> aborted)
> 
> Further, since it sounds like you initially had sockets configured in 
> blocking mode, when the new socket tries to transmit, it will block trying to 
> allocate TCP segments due to the exhausted memory pool.  The blocking will 
> continue until SO_SNDTIMEOUT is reached or the memory exhaustion is resolved
> 
> If you have LwIP stats enabled, you can check the memory pools for errors to 
> figure out which one is failing.  You should be able to resolve this by 
> sizing your memory pools to handle the number of supported connections.  For 
> example if you only support 5 simultaneous TCP connections, then your pools 
> should be big enough to allocate 5 send buffers worth of segments.  This is 
> how I configure my products, which typically have plenty of RAM.  Not sure 
> what the recommendation is for very constrained RAM products.
> 
>> 
>> Calling close() will initiate a graceful synchronized closure of the 
>> connection.  This means continuing to send any queued data until it is 
>> ACKed, the send times out, or we received a RST.  Then a FIN is sent 
>> indicating the sending pathway is closed.
>> 
>> So there's no direct way for the application to tell LWIP to just give up on 
>> one socket without further trying to send data? Can the application specify 
>> a send timeout?\
> 
> Yes there is, with SO_LINGER you can perform an abortive closure rather than 
> graceful by setting the timeout to 0.  Typically this is a bad idea.  There’s 
> a decent discussion here on stackoverflow:
> 
> http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required
>  
> <http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required>
> 
>> 
>> Lastly, what version of LwIP are you using?
>> 
>> I'm using 2.0.0 RC1
> 
> Joel
> 
>> On Wed, Dec 28, 2016 at 4:23 PM, Joel Cunningham <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> On Dec 28, 2016, at 06:45 AM, Daniel Pauli <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> Am I understanding the description correctly that sending on the stale 
>>> connection eventually blocks once the remote side has crashed and this 
>>> prevents sending on the new socket (only because the thread is blocked)?
>>> 
>>> If so, then the socket buffer on the stale socket has filled up (most 
>>> likely) and is now blocking.  This is blocking I/O operating as expected 
>>> when data is not being acknowledged.  You should use non-blocking sockets 
>>> and select if your server is servicing multiple sockets on a single thread.
>>> 
>>> Joel
>>> 
>>> Attempting to send on the stale socket blocks, which is okay on its own. 
>>> But I'm already using select() and observed that
>> 
>>  
>>> 
>>> these stale sockets still somehow seem to block communication over new 
>>> sockets,
>> 
>> 
>> If this is actually happening as described, that would be unexpected/faulty 
>> behavior.  One TCP socket in the half-open state should not have any effect 
>> on the other TCP connections.
>>  
>>> 
>>> even when no stale sockets are included in the write set of select().
>> 
>> 
>> I'm a little confused about the use of select in your application.  Are you 
>> using it with blocking sockets?  Select returning write-ability doesn't 
>> guarantee the send call won't block.  If you have a blocking socket and the 
>> size in the send call can't fit in the amount of available buffer space, the 
>> call will block
>>  
>>> 
>>> I even close() (successfully, according to the return value) those stale 
>>> sockets after they failed to be write-ready after 10 seconds, but I can see 
>>> in Wireshark that LWIP still sends retransmissions from the port number of 
>>> the closed socket. 
>>> 
>>> Could it be that close() cannot send FIN because the output buffer is full, 
>>> so the socket still remains active? Is there a way from the API to just 
>>> drop the connection without involving any more communication?
>> 
>> 
>> Calling close() will initiate a graceful synchronized closure of the 
>> connection.  This means continuing to send any queued data until it is 
>> ACKed, the send times out, or we received a RST.  Then a FIN is sent 
>> indicating the sending pathway is closed.
>> 
>> Lastly, what version of LwIP are you using?
>> 
>> Joel
>> 
>> _______________________________________________
>> lwip-users mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.nongnu.org/mailman/listinfo/lwip-users 
>> <https://lists.nongnu.org/mailman/listinfo/lwip-users>
>> 
>> _______________________________________________
>> lwip-users mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.nongnu.org/mailman/listinfo/lwip-users 
>> <https://lists.nongnu.org/mailman/listinfo/lwip-users>
> 
> _______________________________________________
> lwip-users mailing list
> [email protected] <mailto:[email protected]>
> https://lists.nongnu.org/mailman/listinfo/lwip-users 
> <https://lists.nongnu.org/mailman/listinfo/lwip-users>
> 

Joel

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] TCP send() fails when other sockets perform retransmissions

Reply via email to