Jonathan Larmour wrote:
The comment could be better positioned. The bit:
/* if ERR_MEM, we wait for sent_tcp or poll_tcp to be called
applies to the previous block.
The bit:
on other errors we don't try writing any more */
applies to the block the comment is presently in.
That would be my fault. The comment is indeed a little confusing!
If tcp_enqueue returns ERR_MEM, the present code is correct - it is not a
fatal error, it just means that it should be retried and the . The netconn
layer will do this as a result of its sent_tcp() and poll_tcp() callbacks.
The calling thread will only be woken up when the data really is sent, or
there _is_ a fatal error.
Everything I read in this subject seems perfectly fine to me:
- data gets queued up, the remote host doesn't ACK fast enough
- at one point, the application thread using the socket API gets blocked
while the tcpip thread _isn't_ blocked but processes the tcp pcb (s)
As already noticed, there are 2 settings that limit the date enqueued on
one PCB: the sendbuf (in bytes) and the queuelen (the number of pbufs
being queued - unsent and unacked - for one tcp pcb).
In my opinion, what Piero sees is intended behaviour of lwIP: the
queuelen reaches the predefined limit. Note that this was an u8_t but is
now (1.3.0) an u16_t so by setting it to 0xffff, you can effectively
disable this check if you want.
The fact that one limit is tested in api_msg.c (check sendbuf before
calling tcp_write) and the other is checked in tcp_out.c is due to the
nature of the limits: if the sendbuf can't accept all the data, we can
send less data, but if the queuelen has reached the limit there is
nothing that can be done in api_msg.c so no need to add extra code for it!
About the blocking of lwIP: as Jonathan already said, socket
implementations DO block by default (in situations as described, for
example). Most of them can be told to not block (using O_NONBLOCK or
something). However, lwIP does not fully support this at the moment.
The fact that the problem disappears when disabeling the nagle algorithm
could mean the nagle algorithm has a bug, indeed. But to get to the
source of this, a detailed analysis of the packet flow (using
ethereal/wireshark) as well as a log output of the target runing lwIP
would be useful!
Simon
_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users