Xiang, Thanks for the concrete example of how to break IOB deadlock. If that is what's causing my problem, I will try it out.
cheers adam On Wed, Feb 19, 2020 at 6:11 PM Xiang Xiao <xiaoxiang781...@gmail.com> wrote: > Here is a demo fix for one of IOB deadlock recently : > > commit 2d0baa779d997f39b8121f5965f8125184e80d71 > Author: chao.an <anc...@xiaomi.com> > Date: Thu Jan 16 14:20:09 2020 -0300 > > net/udp: break the network lock to avoid deadlock > > Author: chao.an <anc...@xiaomi.com> > > net/udp: break the network lock to avoid deadlock > > network deadlock when udp sendto() storm is coming > > net/close: force wait tx drain to complete > > atomic send() and close() will causes data to be discarded > directly > > Signed-off-by: chao.an <anc...@xiaomi.com> > > diff --git a/net/udp/udp_psock_sendto_buffered.c > b/net/udp/udp_psock_sendto_buffered.c > index 0dcf892759..fbc44de0f1 100644 > --- a/net/udp/udp_psock_sendto_buffered.c > +++ b/net/udp/udp_psock_sendto_buffered.c > @@ -75,6 +75,7 @@ > #include "neighbor/neighbor.h" > #include "udp/udp.h" > #include "devif/devif.h" > +#include "utils/utils.h" > > > /**************************************************************************** > * Pre-processor Definitions > @@ -713,8 +714,21 @@ ssize_t psock_udp_sendto(FAR struct socket > *psock, FAR const void *buf, > } > else > { > + unsigned int count; > + int blresult; > + > + /* iob_copyin might wait for buffers to be freed, but if > + * network is locked this might never happen, since network > + * driver is also locked, therefore we need to break the lock > + */ > + > + blresult = net_breaklock(&count); > ret = iob_copyin(wrb->wb_iob, (FAR uint8_t *)buf, len, 0, false, > IOBUSER_NET_SOCK_UDP); > + if (blresult >= 0) > + { > + net_restorelock(count); > + } > } > > if (ret < 0) > > The problem is that iob_copybin may allocate more IOB buffer > internally and will wait if IOB isn't available. > the old code call it without breaking netlock, then the other path > can't get the netlock again after the pending IOB finish the sending > and return to the pool. > Hoping this case can give some tips. > > Thanks > Xiang > > On Thu, Feb 20, 2020 at 6:50 AM Gregory Nutt <spudan...@gmail.com> wrote: > > > > > > > This sounds a lot like the problem I'm having with the SAMA5D36 Gigabit > > > ethernet... I'm running into some kind of deadlock on long transfers > that > > > send packets very quickly. NuttX seems to run out of IOBs and then > can't > > > send or respond to network packets. > > > > > > I tried increasing the low priority worker threads to 2 (and also 3) > but > > > neither of them solved the problem. > > > > > > I'll look at the net_lock() to see if there's a way to release it. > > > > > > If you find a solution, I would love to know it! If I find one, I'll > post > > > it here. > > > > The first step in debugging a deadlock is to find what is stuck waiting > > for what resource. > > > > Then find the logic that provides the resource that is being waited on. > > > > Then figure out why that logic is not running. Most likely, it would be > > waiting the low priority work queue. > > > > I have had to solve lots of problems like this. It is not really so > > difficult once you unstand the above things. > > > > > > > -- Adam Feuer <a...@starcat.io>