Many thanks for your advice! I'll try to check the memory pools when I
reproduce the issue next time.

Further, since it sounds like you initially had sockets configured in
> blocking mode, when the new socket tries to transmit, it will block trying
> to allocate TCP segments due to the exhausted memory pool.  The blocking
> will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is
> resolved


To clarify, I ran two tests: In the first, all sockets used the
MSG_DONTWAIT flag for send() (non-blocking), in the second no socket used
the flag (blocking), so there should be no mixing of blocking/non-blocking
from my point of view. I'm not sure if I understand what you mean with
"initially configured in blocking mode". Does this mean that send() may
still block under certain circumstances (exhausted memory pool) even with
MSG_DONTWAIT flag set, so I should initially set the O_NONBLOCK option on
the socket to ensure that send() never blocks?

Daniel

On Fri, Dec 30, 2016 at 6:30 PM, Joel Cunningham <[email protected]>
wrote:

>
> On Dec 30, 2016, at 10:43 AM, Daniel Pauli <[email protected]> wrote:
>
> I'm a little confused about the use of select in your application.  Are
>> you using it with blocking sockets?
>
>
> I tested with both blocking and non-blocking send. I observed that
> non-blocking send (MSG_DONTWAIT flag set) on sockets determined as
> write-ready by select() sometimes returned ENOMEM when "stale sockets" are
> around. After applying the patch from http://lwip.100.n7.
> nabble.com/bug-49684-lwip-netconn-do-writemore-non-
> blocking-ERR-MEM-treated-as-failure-td27860.html, I got EWOULDBLOCK
> errors instead.
>
>
> Thanks for including this information. The ENOMEM gives me a good clue of
> what’s most likely going on.  My guess is that you’re experiencing a memory
> pool exhaustion and the stale socket has claimed memory from a pool for the
> segments which are queued for transmit.  Since those segments are not being
> ACKed in the half open state, the claimed memory won’t be available until
> the segments are freed (happens during transmission timeout or when socket
> is aborted)
>
> Further, since it sounds like you initially had sockets configured in
> blocking mode, when the new socket tries to transmit, it will block trying
> to allocate TCP segments due to the exhausted memory pool.  The blocking
> will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is
> resolved
>
> If you have LwIP stats enabled, you can check the memory pools for errors
> to figure out which one is failing.  You should be able to resolve this by
> sizing your memory pools to handle the number of supported connections.
> For example if you only support 5 simultaneous TCP connections, then your
> pools should be big enough to allocate 5 send buffers worth of segments.
> This is how I configure my products, which typically have plenty of RAM.
> Not sure what the recommendation is for very constrained RAM products.
>
>
>> Calling close() will initiate a graceful synchronized closure of the
>> connection.  This means continuing to send any queued data until it is
>> ACKed, the send times out, or we received a RST.  Then a FIN is sent
>> indicating the sending pathway is closed.
>
>
> So there's no direct way for the application to tell LWIP to just give up
> on one socket without further trying to send data? Can the application
> specify a send timeout?\
>
>
> Yes there is, with SO_LINGER you can perform an abortive closure rather
> than graceful by setting the timeout to 0.  Typically this is a bad idea.
> There’s a decent discussion here on stackoverflow:
>
> http://stackoverflow.com/questions/3757289/tcp-option-
> so-linger-zero-when-its-required
>
>
> Lastly, what version of LwIP are you using?
>
>
> I'm using 2.0.0 RC1
>
>
> Joel
>
> On Wed, Dec 28, 2016 at 4:23 PM, Joel Cunningham <[email protected]>
> wrote:
>
>>
>>
>> On Dec 28, 2016, at 06:45 AM, Daniel Pauli <[email protected]> wrote:
>>
>> Am I understanding the description correctly that sending on the stale
>>> connection eventually blocks once the remote side has crashed and this
>>> prevents sending on the new socket (only because the thread is blocked)?
>>>
>>> If so, then the socket buffer on the stale socket has filled up (most
>>> likely) and is now blocking.  This is blocking I/O operating as expected
>>> when data is not being acknowledged.  You should use non-blocking sockets
>>> and select if your server is servicing multiple sockets on a single thread.
>>>
>>> Joel
>>>
>>
>> Attempting to send on the stale socket blocks, which is okay on its own.
>> But I'm already using select() and observed that
>>
>>
>>
>> these stale sockets still somehow seem to block communication over new
>> sockets,
>>
>>
>> If this is actually happening as described, that would be
>> unexpected/faulty behavior.  One TCP socket in the half-open state should
>> not have any effect on the other TCP connections.
>>
>>
>> even when no stale sockets are included in the write set of select().
>>
>>
>> I'm a little confused about the use of select in your application.  Are
>> you using it with blocking sockets?  Select returning write-ability doesn't
>> guarantee the send call won't block.  If you have a blocking socket and the
>> size in the send call can't fit in the amount of available buffer space,
>> the call will block
>>
>>
>> I even close() (successfully, according to the return value) those stale
>> sockets after they failed to be write-ready after 10 seconds, but I can see
>> in Wireshark that LWIP still sends retransmissions from the port number of
>> the closed socket.
>>
>> Could it be that close() cannot send FIN because the output buffer is
>> full, so the socket still remains active? Is there a way from the API to
>> just drop the connection without involving any more communication?
>>
>>
>> Calling close() will initiate a graceful synchronized closure of the
>> connection.  This means continuing to send any queued data until it is
>> ACKed, the send times out, or we received a RST.  Then a FIN is sent
>> indicating the sending pathway is closed.
>>
>> Lastly, what version of LwIP are you using?
>>
>> Joel
>>
>> _______________________________________________
>> lwip-users mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/lwip-users
>>
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
>
>
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
>
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to