Hi Xiang, Please note what I describe is not caused by my code using multiple threads, but is happening on Nuttx upstream. My code is single threaded, but s32k1xx_flexcan driver (and several other Socket CAN drivers as they all seem to be derived from the same code base) does some things on the thread I call write() from, and some other things on CANWORK work queue thread. My understanding is that net code is structured so work queue threads are used, generally, but in Socket CAN drivers this was "waived" to avoid data loss, causing the problem I describe.
Thanks, Carlos On Tue, Jan 3, 2023 at 6:56 PM Xiang Xiao <xiaoxiang781...@gmail.com> wrote: > Since tx/rx share the same d_len/d_buf, you must do send/recv in one and > only thread(either by system work thread or driver dedicated thread) to > avoid the race condition you describe below. > > On Wed, Jan 4, 2023 at 1:45 AM Carlos Sanchez > <carlossanc...@geotab.com.invalid> wrote: > > > Hi all, > > > > I am observing an extrange behavior: under heavy-error CAN TX scenario > (no > > acks so TX fails always), usually after the second call to write() my > > writes fail. This is expected as s32k1xx_flexcan has two TX mailboxes and > > from my understanding of the code there is no other buffering (on this > > specific CAN driver at least). > > > > However, if I enable CAN errors, and depending on runtime sync conditions > > (basically, if I put a breakpoint on s32k1xx_txpoll) then all the writes > > after the first silently fail, without really trying to send anything (I > > see on ESR2 register than second TX mailbox does not really become > active). > > > > After some debugging, I have seen that the CANWORK=LPWORK thread has > > scheduled calls to s32k1xx_error_work, overwrites d_buf/d_len to send the > > error frames in. But TX polling does not always happen in CANWORK thread: > > it does when it comes from s32k1xx_txdone_work, but not when it comes > from > > s32k1xx_txavail_work, which, despite the name, is called directly on > > s32k1xx_txavail context (which is the application context). What is > > happening in my case, is txavail_work->...->devif_poll->can_poll is > setting > > d_buf/d_len to the packed to be sent, but before s32k1xx_txpoll checks it > > (due to my breakpoint), s32k1xx_error kicks in, "steals" d_buf/d_len to > > setup the error frame and calls can_input. The frame which was set up by > > the polling sequence gets silently discarded. > > > > I have tried setting s32k1xx_txavail_work inside CANWORK, but this fails > > because can_sendmsg checks immediately for non-blocking writes; placing > the > > polling in the work queue would cause all non-blocking writes to fail. > > > > Related to this, I also fail to see the arbitration between TX/RX for > > d_buf/d_len. From what I see, the same problem I am describing could > happen > > by s32k1xx_receive "stealing" d_buf/d_len, same as s32k1xx_error is doing > > in my case. But this is only a thought, I have not observed it. > > > > A possible clean solution is to use another buffer, but it is complex and > > would mean losing the direct connection between the write and the HW TX > > (which might be useful in general and it is for my use case). A quicker > > solution would be for s32k1xx_error to lock the network, forcing it to > wait > > until txavail_work is done. This would solve my case. My second concern > is > > more difficult to solve as comments in the code explicitly say RX cannot > be > > delayed to the work queue or CAN frames would be lost. > > > > Any ideas or anything I might be missing here? > > > > Thanks, > > > > Carlos > > > > -- > > > > Carlos Sanchez (he, him, his) > > Geotab > > > > Embedded Systems Developer Team Lead | Europe > > > > Visit > > > > www.geotab.com > > > > Twitter <https://twitter.com/geotab> | Facebook > > <https://www.facebook.com/Geotab> | YouTube > > <https://www.youtube.com/user/MyGeotab> | LinkedIn > > <https://www.linkedin.com/company/geotab/> > > > -- Carlos Sanchez (he, him, his) Geotab Embedded Systems Developer Team Lead | Europe Visit www.geotab.com Twitter <https://twitter.com/geotab> | Facebook <https://www.facebook.com/Geotab> | YouTube <https://www.youtube.com/user/MyGeotab> | LinkedIn <https://www.linkedin.com/company/geotab/>