Hi Peter,

I have submitted a PR with a couple of fixes to s32k1xx_flexcan.c:
https://github.com/apache/nuttx/pull/8058

Among those, there is a fix to MAXMB field initialization in MCR. The
effects of that wrong init is the HW will use mailboxes the driver does not
configure nor check, causing RX data loss. This use gets more frequent as
the CAN frame arrival rate overcomes the interrupt handler frequency, as
the first mailboxes are not empty. I am wondering if this might have
affected your measurements when you decided not to use a work queue for RX
because you observed frame loss.

As I do not have the means to test heavy CAN traffic flow, I am just
pointing it to you. In my code, I have changed to a work queue based RX and
I have seen no problems so far (but IMHO it would be wiser to retest on
whatever setup you detected the frame loss originally).

BR

Carlos

On Wed, Jan 4, 2023 at 1:37 PM Peter van der Perk <peter.vanderp...@nxp.com>
wrote:

> Hi,
>
> It seems that calling can_input directly from IRQ got broken since the IOB
> rewrite.
> Before can_input only used dev->d_appdata, but now can_input overwrites
> the dev->d_buf pointer as well.
>
> https://github.com/apache/nuttx/blob/779a610ca3ba495640b49d6c36bce89784955e0d/net/can/can_input.c#L231
> dev-
> <https://github.com/apache/nuttx/blob/779a610ca3ba495640b49d6c36bce89784955e0d/net/can/can_input.c#L231dev->>d_len
> always has been fixed to sizeof(struct can_frame) or sizeof(struct
> canfd_frame) depending on kconfig setting.
>
> Once the whole IOB rewrite gets in, is mature and is fully documented I
> can took a look at the mechanism again.
>
> Short term you could either go back to an older version of NuttX, or try
> to schedule the workqueue for can_input and see if you can enough
> throughput.
>
> Yours sincerely,
>
> Peter van der Perk
>
> -----Original Message-----
> From: raiden00pl <raiden0...@gmail.com>
> Sent: Wednesday, January 4, 2023 10:01 AM
> To: dev@nuttx.apache.org
> Subject: Re: d_len/d_buf arbitration for s32k1xx_flexcan
>
> Related issue:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fnuttx%2Fissues%2F5142&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HIk%2Fza58GP94ZSc1ANtynWEz3fD3o6PV05Kv90ORYig%3D&reserved=0
> .
> Not using a work queue to handle CAN RX is basically wrong. Unfortunately
> all NXP SocketCAN drivers are affected.
>
> wt., 3 sty 2023 o 19:38 Xiang Xiao <xiaoxiang781...@gmail.com> napisaƂ(a):
>
> >   Sorry, "you must do..." may confuse you. What I mean is the CAN driver.
> >
> > On Wed, Jan 4, 2023 at 2:30 AM Carlos Sanchez
> > <carlossanc...@geotab.com.invalid> wrote:
> >
> > > Hi Xiang,
> > >
> > > Please note what I describe is not caused by my code using multiple
> > > threads, but is happening on Nuttx upstream. My code is single
> > > threaded, but s32k1xx_flexcan driver (and several other Socket CAN
> > > drivers as they all seem to be derived from the same code base) does
> > > some things on the thread I call write() from, and some other things
> > > on CANWORK work queue thread.
> > > My understanding is that net code is structured so work queue
> > > threads are used, generally, but in Socket CAN drivers this was
> > > "waived" to avoid
> > data
> > > loss, causing the problem I describe.
> > >
> > > Thanks,
> > >
> > > Carlos
> > >
> > > On Tue, Jan 3, 2023 at 6:56 PM Xiang Xiao
> > > <xiaoxiang781...@gmail.com>
> > > wrote:
> > >
> > > > Since tx/rx share the same d_len/d_buf, you must do send/recv in
> > > > one
> > and
> > > > only thread(either by system work thread or driver dedicated
> > > > thread) to avoid the race condition you describe below.
> > > >
> > > > On Wed, Jan 4, 2023 at 1:45 AM Carlos Sanchez
> > > > <carlossanc...@geotab.com.invalid> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am observing an extrange behavior: under heavy-error CAN TX
> > scenario
> > > > (no
> > > > > acks so TX fails always), usually after the second call to
> > > > > write() my writes fail. This is expected as s32k1xx_flexcan has
> > > > > two TX mailboxes
> > > and
> > > > > from my understanding of the code there is no other buffering
> > > > > (on
> > this
> > > > > specific CAN driver at least).
> > > > >
> > > > > However, if I enable CAN errors, and depending on runtime sync
> > > conditions
> > > > > (basically, if I put a breakpoint on s32k1xx_txpoll) then all
> > > > > the
> > > writes
> > > > > after the first silently fail, without really trying to send
> > > > > anything
> > > (I
> > > > > see on ESR2 register than second TX mailbox does not really
> > > > > become
> > > > active).
> > > > >
> > > > > After some debugging, I have seen that the CANWORK=LPWORK thread
> > > > > has scheduled calls to s32k1xx_error_work, overwrites
> > > > > d_buf/d_len to send
> > > the
> > > > > error frames in. But TX polling does not always happen in
> > > > > CANWORK
> > > thread:
> > > > > it does when it comes from s32k1xx_txdone_work, but not when it
> > > > > comes
> > > > from
> > > > > s32k1xx_txavail_work, which, despite the name, is called
> > > > > directly on s32k1xx_txavail context (which is the application
> > > > > context). What is happening in my case, is
> > > > > txavail_work->...->devif_poll->can_poll is
> > > > setting
> > > > > d_buf/d_len to the packed to be sent, but before s32k1xx_txpoll
> > checks
> > > it
> > > > > (due to my breakpoint), s32k1xx_error kicks in, "steals"
> > > > > d_buf/d_len
> > to
> > > > > setup the error frame and calls can_input. The frame which was
> > > > > set up
> > > by
> > > > > the polling sequence gets silently discarded.
> > > > >
> > > > > I have tried setting s32k1xx_txavail_work inside CANWORK, but
> > > > > this
> > > fails
> > > > > because can_sendmsg checks immediately for non-blocking writes;
> > placing
> > > > the
> > > > > polling in the work queue would cause all non-blocking writes to
> > fail.
> > > > >
> > > > > Related to this, I also fail to see the arbitration between
> > > > > TX/RX for d_buf/d_len. From what I see, the same problem I am
> > > > > describing could
> > > > happen
> > > > > by s32k1xx_receive "stealing" d_buf/d_len, same as s32k1xx_error
> > > > > is
> > > doing
> > > > > in my case. But this is only a thought, I have not observed it.
> > > > >
> > > > > A possible clean solution is to use another buffer, but it is
> > > > > complex
> > > and
> > > > > would mean losing the direct connection between the write and
> > > > > the HW
> > TX
> > > > > (which might be useful in general and it is for my use case). A
> > quicker
> > > > > solution would be for s32k1xx_error to lock the network, forcing
> > > > > it
> > to
> > > > wait
> > > > > until txavail_work is done. This would solve my case. My second
> > concern
> > > > is
> > > > > more difficult to solve as comments in the code explicitly say
> > > > > RX
> > > cannot
> > > > be
> > > > > delayed to the work queue or CAN frames would be lost.
> > > > >
> > > > > Any ideas or anything I might be missing here?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Carlos
> > > > >
> > > > > --
> > > > >
> > > > > Carlos Sanchez (he, him, his)
> > > > > Geotab
> > > > >
> > > > > Embedded Systems Developer Team Lead | Europe
> > > > >
> > > > > Visit
> > > > >
> > > > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2
> > > > > Fwww.geotab.com%2F&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7C9
> > > > > 9afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c3016
> > > > > 35%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > > > > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > > > > 0%7C%7C%7C&sdata=V7yT6O%2FNT%2B4zbERMFnffrayp59Kgjlqo9o4U4bF92H4
> > > > > %3D&reserved=0
> > > > >
> > > > > Twitter
> > > > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F
> > > > > %2Ftwitter.com%2Fgeotab&data=05%7C01%7Cpeter.vanderperk%40nxp.co
> > > > > m%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5
> > > > > c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJ
> > > > > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> > > > > 7C3000%7C%7C%7C&sdata=1DlNx9Iy%2BvxErS2AsgU43uut0IMDF29QDeL5usCu
> > > > > Lto%3D&reserved=0> | Facebook
> > > > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F
> > > > > %2Fwww.facebook.com%2FGeotab&data=05%7C01%7Cpeter.vanderperk%40n
> > > > > xp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92c
> > > > > d99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3
> > > > > d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> > > > > 0%3D%7C3000%7C%7C%7C&sdata=BJKh9GlKnnMUZ1d31mW%2B%2FNrWoVUS5lL6t
> > > > > 8SEhVdt2XM%3D&reserved=0> | YouTube
> > > > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F
> > > > > %2Fwww.youtube.com%2Fuser%2FMyGeotab&data=05%7C01%7Cpeter.vander
> > > > > perk%40nxp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b
> > > > > 4c6fa92cd99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTW
> > > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> > > > > JXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yw5fWwK9J0sfLkwzWqdvqj7Bn%2BZ
> > > > > xwfcsMvjsPxOf9fg%3D&reserved=0> | LinkedIn
> > > > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F
> > > > > %2Fwww.linkedin.com%2Fcompany%2Fgeotab%2F&data=05%7C01%7Cpeter.v
> > > > > anderperk%40nxp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d
> > > > > 3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown
> > > > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > > > WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=u5FaRsEC2aUMRWfAlt98v%2F
> > > > > 1dmbCNn7%2FMgvNXoSmzyUU%3D&reserved=0>
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Carlos Sanchez (he, him, his)
> > > Geotab
> > >
> > > Embedded Systems Developer Team Lead | Europe
> > >
> > > Visit
> > >
> > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > > .geotab.com%2F&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7C99afe46e1
> > > 5344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7
> > > C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> > > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V7y
> > > T6O%2FNT%2B4zbERMFnffrayp59Kgjlqo9o4U4bF92H4%3D&reserved=0
> > >
> > > Twitter
> > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft
> > > witter.com%2Fgeotab&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7C99af
> > > e46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%
> > > 7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA
> > > iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdat
> > > a=1DlNx9Iy%2BvxErS2AsgU43uut0IMDF29QDeL5usCuLto%3D&reserved=0> |
> > > Facebook
> > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> > > ww.facebook.com%2FGeotab&data=05%7C01%7Cpeter.vanderperk%40nxp.com%7
> > > C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > > %7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> > > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> > > &sdata=BJKh9GlKnnMUZ1d31mW%2B%2FNrWoVUS5lL6t8SEhVdt2XM%3D&reserved=0
> > > > | YouTube
> > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> > > ww.youtube.com%2Fuser%2FMyGeotab&data=05%7C01%7Cpeter.vanderperk%40n
> > > xp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92cd99c
> > > 5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%
> > > 7C%7C%7C&sdata=Yw5fWwK9J0sfLkwzWqdvqj7Bn%2BZxwfcsMvjsPxOf9fg%3D&rese
> > > rved=0> | LinkedIn
> > > <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> > > ww.linkedin.com%2Fcompany%2Fgeotab%2F&data=05%7C01%7Cpeter.vanderper
> > > k%40nxp.com%7C99afe46e15344b2883eb08daee32418c%7C686ea1d3bc2b4c6fa92
> > > cd99c5c301635%7C0%7C1%7C638084196858222395%7CUnknown%7CTWFpbGZsb3d8e
> > > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > > 3000%7C%7C%7C&sdata=u5FaRsEC2aUMRWfAlt98v%2F1dmbCNn7%2FMgvNXoSmzyUU%
> > > 3D&reserved=0>
> > >
> >
>


-- 

Carlos Sanchez (he, him, his)
Geotab

Embedded Systems Developer Team Lead | Europe

Visit

www.geotab.com

Twitter <https://twitter.com/geotab> | Facebook
<https://www.facebook.com/Geotab> | YouTube
<https://www.youtube.com/user/MyGeotab> | LinkedIn
<https://www.linkedin.com/company/geotab/>

Reply via email to