Hi Nathan,

I just ran into this, this week on all the H7 boards in PX4.

This was broken by
https://github.com/apache/incubator-nuttx/pull/459/files#diff-8e09d7a85a1cc6cc65e0bb43d7aa751e3caaec9f9d5e824f9b741a1487ec9199L117-L138

I did not catch the implication of that change at the time.

The original idea we to leave sram4 out when the CONFIG_MM_REGIONS was 3

The side effect benefit of the ordering was that tasks allocated first used
DTCM :).

But that is not very deterministic. Adding the dtcm heap is a move in the
better direction. Then set CONFIG_MM_REGIONS to 1.

However, not having a "SPEED" parameter to allocate makes using the dtcm
very arch/chip specific.

As you have noted checking at build time only works for in tree
configurations. The quick fix to add an CONFIG_..._EXCLUDE_SRAM4... (naming
TBD) and light it in things that select BDMA to SRAM4 would a good to
addition, and help in some cases, but selecting it just on BDMA could be
wasteful.
Maybe that is just a warning if SRAM4 is added to the heap when BDMA is
enabled.

These more complex muti-power domain Soc, require more floor planning of
resource allocation on the board designer and users part.

I think a more flexible approach is would be to rework the heap addition
logic to be selectable and orderable.

Then a board designer can choose to add the RAM blocks to the heap and order
them as desired.

CONFIG_...ADD_ORDER_SRAM
CONFIG_... ADD_ORDER _SRAM12..
CONFIG_... ADD_ORDER _ SRAM3..
CONFIG_... ADD_ORDER _SRAM4..
CONFIG_... ADD_ORDER _DTCM..

(naming TBD)

0 is not used. 1 first added, N last added.

This gets defined into an array with the start addresses and lengths. The
code then processes them in a loop until all are added.

I think this would give us the all in one solution and not force the
solution back to calling add region in late init per board.

In the case of SRAM4 it is small and I do not think there is value in
carving it up with static and partial heap. (But when you are out of memory
times get desperate)

I spent quite a bit of time mulling over using the granual allocator and
placing the buffers in the drivers. My conclusion was this:
For the granual allocator you have to give it the static memory large enough
to meet ALL demand. (this creates a runtime size dependency the user has to
manage) After init, the memory will be given out to the drives and that is
dependent on device messages sizes (i.e MAX SPI payload which is for the
most part static, unless you have PNP SPI) So once the memory is allocated
out, it is the same as the static allocation of buffers in the drivers. It
is more code, more complexity, can fail at run time for out of memory or
DMA/Dcache alignment reasons. The compiler telling you SRAM4 is full is a
lot easier to debug :)
The other issues is there is no way to pass a buffer for 2 bytes to a driver
and DMA/Dcache validate it and fall back to non DMA. The in driver buffers
do add a mem copy but given the DMA/Dcache alignment issues, and after doing
the timing analysis is it very small addition compared to the actual IO
time.

David



-----Original Message-----
From: Nathan Hartman [mailto:hartman.nat...@gmail.com]
Sent: Thursday, March 25, 2021 4:28 PM
To: dev@nuttx.apache.org
Subject: How to ensure HEAP will not overlap static DMA buffer?

tl;dr: How do I statically allocate DMA buffers and prevent the heap
from overlapping them?

Details: On STM32H7, BDMA can only reach SRAM4 (64 KB starting at
0x3800:0000). The armv7m DCache line size means the buffers must be
allocated on a cache line boundary. So a DMA buffer must meet 2
requirements: (1) be in SRAM4, and (2) be aligned and padded to 32
byte increments.

Ok, I can do this easily with:

static volatile uint8_t my_bdma_buf[128] __attribute__ ((section
(".sram4"))) __attribute__ ((aligned (32)));

or equivalent.

But then the heap will overlap this memory and my buffer will be
clobbered because arch/arm/src/stm32h7/stm32_allocateheap.c
arm_addregion() does this:

addregion (SRAM4_START, SRAM4_END - SRAM4_START, "SRAM4");

Looking into it, I see that in arch/arm/src/stm32h7/stm32_spi.c,
g_spi6_txbuf[] and g_spi6_rxbuf[] are declared like this:

static uint8_t g_spi6_txbuf[SPI6_DMABUFSIZE_ADJUSTED]
SPI6_DMABUFSIZE_ALGN locate_data(".sram4");
static uint8_t g_spi6_rxbuf[SPI6_DMABUFSIZE_ADJUSTED]
SPI6_DMABUFSIZE_ALGN locate_data(".sram4");

So they will suffer that same problem. (I'm pretty sure that's a bug,
unless I missed something.)

I discovered this the hard way, i.e., on my board, NSH suddenly
wouldn't start. In reality, it does start but it turns out that when
it tries to print the greeting message, it fails on the very first
character ('\n') and exits. Why? I single-stepped all the way into the
guts of the serial logic, and found that in fs/vfs/fs_write.c,
file_write(), it fetches the inode and tests if it's valid (!inode ||
!inode->u.i_ops || !inode->u.i_ops->write) or returns -EBADF. That's
where it fails, because it seems that the inode is allocated at
0x3800:0070, right on top of my BDMA buffers!

I don't know which question to ask, but I think it's one of these:

(1) Is there a standard(-ish) way to allocate a buffer that is both
aligned to a specified boundary AND in a specified region of memory?

(2) Is there a standard way to make a hole in the heap so that nothing
else in NuttX will overwrite the DMA buffer?

(Yes, I see that I can fiddle with CONFIG_MM_REGIONS, but then only
the 512 KB of AXI SRAM will be in the heap, and none of the other
regions, SRAM1,2,3, etc., and I don't think I want that.)

Thanks,
Nathan

Reply via email to