Hello,

miquel.ray...@bootlin.com wrote on Fri, 27 Oct 2023 18:20:25 +0200:

> Sequential DMA bursts improve NIC/RAM usage thanks to the basic NIC
> hardware optimizations available when performing in-order sequential
> accesses. This can be further enforced with the IPU DMA locking
> mechanism which basically prevents any other IP to access the
> interconnect for a longer time while performing up to 8 sequential DMA
> bursts. The drawback is a lower availability for short time periods and
> delayed accesses which may cause problem with latency-sensible systems
> (typically, the network might suffer from high drop rates). This is even
> more visible with larger displays requiring even more RAM bandwidth.
> 
> Issues have been observed on IMX6Q. The setup featured a 60Hz 1024x768
> LVDS display just showing a static picture (thus no CPU usage, only
> background DMA bringing the picture to the display engine). When
> performing full speed iperf3 uplink tests with the FEC, almost no drop
> was observed, whereas the drop would raise above 50% when limiting the
> bandwidth to 1Mb/s (on a 100Mb/s link). The exact same test with the
> display pipeline disabled would show little to no drop. The LP-DDR3 chip
> on the module would allow up to ~53MiB each 1/60th of a second, and the
> display pipeline consume approximately ~10MiB of this bandwidth, and
> thus be active 20% of the time on each time slot.
> 
> One particular feature of the IPU DMA controller (IDMAC) is the ability
> to serialize DMA bursts and to lock further interconnect accesses when
> doing so. Experimentally, disabling the locking lead to a drop rate from
> 50% down to 10%. A few more % could be earned by setting the burst
> number to 1. It seems this huge difference could be explained by a
> possible hardware conflict between the locking feature and some QoS
> logic. Indeed, on IMX6Q, the NIC-301 manages priorities and by default
> will elect ENET's requests (priority 2) above IPU's requests (priority
> 0). But the QoS seems to only be valid above a certain threshold, which
> is: 4 consequent DMA bursts in the case of the IPU. It was indeed
> observed that tweaking the number of bursts to be lowered from 8 to 4
> would lead to a significant increase in the Ethernet transfers
> stability. IOW, it looks like when the display pipeline performs DMA
> transfers, incoming DMA requests from other master devices on the
> interconnect are delayed too much (or canceled).
> 
> I have no clue to explain why on the Ethernet MAC side some uDMA
> transfers would never reach completion, especially without notification
> nor any error. All uplink transfers are properly queued at the FEC level
> and more importantly, the corresponding interrupts are fired upon
> "proper transmission" and report no error whatsoever (note: there is no
> actual way to know the uDMA internal controller could not fetch the
> data, only MAC errors could be reported at this stage).
> 
> As a solution, we might want to prevent these DMA bursts from being
> queued together. Maybe the IMX6Q is primarily used for its graphics
> capabilities, but when the network (and other RAM consuming subsystem)
> also matter, it may be relevant to apply this workaround in order to
> help them fetching from RAM more reliably.
> 
> Signed-off-by: Miquel Raynal <miquel.ray...@bootlin.com>
> ---
> 
> Hello,
> 
> This really is an RFC as the bug was also observed on v6.5 but the fix
> proposed here was written and tested on a v4.14 kernel. I want to
> discuss the approach and ideally get some feedback from imx6 experts who
> know the SoC internals before publishing a clean series. There is a lot
> of guessing in this workaround, besides the experimental measures I
> managed to do. I would be glad if someone could sched any light or
> involve knowledgeable people in this conversation.
> 
> The initial report was there and mainly focused on the network
> subsystem:
> https://lore.kernel.org/netdev/18b72fdb-d24a-a416-ffab-3a15b281a...@katalix.com/T/#md265d6da81b8fb6b85e3adbb399bcda79dfc761c
> In this thread I made wrong observations because for speeding up my test
> cycles, I dropped the support for: DRM, SND, USB as these subsystems
> seemed totally irrelevant. It actually had a strong impact.
> 
> In the end, I really think there is something wrong with the locking of
> IPU DMA bursts when mixed with the QoS of the NIC.

Further investigation lead to the DDR configuration itself. The
system worked perfectly besides the Ethernet drop rate which was
abnormally high and it turns out, just changing a bit in the DDR reset
pad configuration fixed it. I cannot explain exactly what was the root
cause but it is possible that the DDR was in a relatively unstable
state due to the power-on/reset procedure not being followed correctly
due to the incomplete pad configuration.

Here is the U-Boot thread I've started: 
https://lore.kernel.org/u-boot/20231117150044.1792080-1-miquel.ray...@bootlin.com/

Thanks,
Miquèl

Reply via email to