Many thanks for your advice! I'll try to check the memory pools when I reproduce the issue next time.
Further, since it sounds like you initially had sockets configured in > blocking mode, when the new socket tries to transmit, it will block trying > to allocate TCP segments due to the exhausted memory pool. The blocking > will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is > resolved To clarify, I ran two tests: In the first, all sockets used the MSG_DONTWAIT flag for send() (non-blocking), in the second no socket used the flag (blocking), so there should be no mixing of blocking/non-blocking from my point of view. I'm not sure if I understand what you mean with "initially configured in blocking mode". Does this mean that send() may still block under certain circumstances (exhausted memory pool) even with MSG_DONTWAIT flag set, so I should initially set the O_NONBLOCK option on the socket to ensure that send() never blocks? Daniel On Fri, Dec 30, 2016 at 6:30 PM, Joel Cunningham <[email protected]> wrote: > > On Dec 30, 2016, at 10:43 AM, Daniel Pauli <[email protected]> wrote: > > I'm a little confused about the use of select in your application. Are >> you using it with blocking sockets? > > > I tested with both blocking and non-blocking send. I observed that > non-blocking send (MSG_DONTWAIT flag set) on sockets determined as > write-ready by select() sometimes returned ENOMEM when "stale sockets" are > around. After applying the patch from http://lwip.100.n7. > nabble.com/bug-49684-lwip-netconn-do-writemore-non- > blocking-ERR-MEM-treated-as-failure-td27860.html, I got EWOULDBLOCK > errors instead. > > > Thanks for including this information. The ENOMEM gives me a good clue of > what’s most likely going on. My guess is that you’re experiencing a memory > pool exhaustion and the stale socket has claimed memory from a pool for the > segments which are queued for transmit. Since those segments are not being > ACKed in the half open state, the claimed memory won’t be available until > the segments are freed (happens during transmission timeout or when socket > is aborted) > > Further, since it sounds like you initially had sockets configured in > blocking mode, when the new socket tries to transmit, it will block trying > to allocate TCP segments due to the exhausted memory pool. The blocking > will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is > resolved > > If you have LwIP stats enabled, you can check the memory pools for errors > to figure out which one is failing. You should be able to resolve this by > sizing your memory pools to handle the number of supported connections. > For example if you only support 5 simultaneous TCP connections, then your > pools should be big enough to allocate 5 send buffers worth of segments. > This is how I configure my products, which typically have plenty of RAM. > Not sure what the recommendation is for very constrained RAM products. > > >> Calling close() will initiate a graceful synchronized closure of the >> connection. This means continuing to send any queued data until it is >> ACKed, the send times out, or we received a RST. Then a FIN is sent >> indicating the sending pathway is closed. > > > So there's no direct way for the application to tell LWIP to just give up > on one socket without further trying to send data? Can the application > specify a send timeout?\ > > > Yes there is, with SO_LINGER you can perform an abortive closure rather > than graceful by setting the timeout to 0. Typically this is a bad idea. > There’s a decent discussion here on stackoverflow: > > http://stackoverflow.com/questions/3757289/tcp-option- > so-linger-zero-when-its-required > > > Lastly, what version of LwIP are you using? > > > I'm using 2.0.0 RC1 > > > Joel > > On Wed, Dec 28, 2016 at 4:23 PM, Joel Cunningham <[email protected]> > wrote: > >> >> >> On Dec 28, 2016, at 06:45 AM, Daniel Pauli <[email protected]> wrote: >> >> Am I understanding the description correctly that sending on the stale >>> connection eventually blocks once the remote side has crashed and this >>> prevents sending on the new socket (only because the thread is blocked)? >>> >>> If so, then the socket buffer on the stale socket has filled up (most >>> likely) and is now blocking. This is blocking I/O operating as expected >>> when data is not being acknowledged. You should use non-blocking sockets >>> and select if your server is servicing multiple sockets on a single thread. >>> >>> Joel >>> >> >> Attempting to send on the stale socket blocks, which is okay on its own. >> But I'm already using select() and observed that >> >> >> >> these stale sockets still somehow seem to block communication over new >> sockets, >> >> >> If this is actually happening as described, that would be >> unexpected/faulty behavior. One TCP socket in the half-open state should >> not have any effect on the other TCP connections. >> >> >> even when no stale sockets are included in the write set of select(). >> >> >> I'm a little confused about the use of select in your application. Are >> you using it with blocking sockets? Select returning write-ability doesn't >> guarantee the send call won't block. If you have a blocking socket and the >> size in the send call can't fit in the amount of available buffer space, >> the call will block >> >> >> I even close() (successfully, according to the return value) those stale >> sockets after they failed to be write-ready after 10 seconds, but I can see >> in Wireshark that LWIP still sends retransmissions from the port number of >> the closed socket. >> >> Could it be that close() cannot send FIN because the output buffer is >> full, so the socket still remains active? Is there a way from the API to >> just drop the connection without involving any more communication? >> >> >> Calling close() will initiate a graceful synchronized closure of the >> connection. This means continuing to send any queued data until it is >> ACKed, the send times out, or we received a RST. Then a FIN is sent >> indicating the sending pathway is closed. >> >> Lastly, what version of LwIP are you using? >> >> Joel >> >> _______________________________________________ >> lwip-users mailing list >> [email protected] >> https://lists.nongnu.org/mailman/listinfo/lwip-users >> > > _______________________________________________ > lwip-users mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/lwip-users > > > > _______________________________________________ > lwip-users mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/lwip-users >
_______________________________________________ lwip-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lwip-users
