Following further investigation this does not obviously appear to be a 
traditional case of memory corruption. With the following code in a single 
threaded component the problem is being exhibited:

#include <camkes.h>
#include <sel4platsupport/io.h>
#include <stdio.h>

void* debug_memalign(size_t align, size_t size)
{
    void* addr = memalign(align, size);
    printf("memalign: align = 0x%x, size = 0x%x. Returned address = 
%p\n",align, size, addr);
    return addr;
}

int run_main(ps_io_ops_t *io_ops)
{
    debug_memalign(0x40, 0x1000);
    debug_memalign(0x40, 0x1000);

    <snip, continue processing>

    // Loop forever
    while (1);
}
CAMKES_POST_INIT_MODULE_DEFINE(run_main_, run_main);

This code, performing two calls to memalign as the first processing within the 
component, results in the following output:

ELF-loader started on CPU: ARM Ltd. Cortex-A53 r0p4
  paddr=[408b3000..40c21117]
No DTB passed in from boot loader.
Looking for DTB in CPIO archive...found at 409ff090.
Loaded DTB from 409ff090.
   paddr=[4023f000..40248fff]
ELF-loading image 'kernel' to 40000000
  paddr=[40000000..4023efff]
  vaddr=[ffffff8040000000..ffffff804023efff]
  virt_entry=ffffff8040000000
ELF-loading image 'capdl-loader' to 40249000
  paddr=[40249000..404c1fff]
  vaddr=[400000..678fff]
  virt_entry=408e78
Enabling MMU and paging
Jumping to kernel-image entry point...

Warning:  gpt_cntfrq 8333333, expected 8000000
Bootstrapping kernel
available phys memory regions: 1
  [40000000..c0000000]
reserved virt address space regions: 3
  [ffffff8040000000..ffffff804023f000]
  [ffffff804023f000..ffffff8040249000]
  [ffffff8040249000..ffffff80404c2000]
Booting all finished, dropped to user space
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0
memalign: align = 0x40, size = 0x1000. Returned address = 0x57d1c0

As can be seen the same address is returned for both calls and there is no 
earlier processing in this case to corrupt the bookkeeping of the memalign / 
malloc routines. I have a few observations that may be of interest:

1. As previously noted, if the two calls to ‘memalign’ are replaced with calls 
to ‘malloc’ then no problems are seen; two non-overlapping memory regions are 
allocated.

2. The behaviour appears to depend on the other code / data linked into the 
executable. If the calls to ‘memalign’ are the only functionality performed by 
the component then two non-overlapping memory regions are allocated as 
expected, however if the calls to ‘memalign’ are followed by calls to the 
U-Boot code we are porting to seL4 then the unexpected behaviour is observed.

3. The behaviour is being observed on a board for which we are in the process 
of adding support (the Avnet MaaXBoard, and i.MX8 based board). I can confirm 
however that seL4test passes on the board as well tests of an ethernet driver 
(using the Ethdriver and PicoServer global components) and tests of a MMC/SDHC 
driver. As such there is a degree of confidence in the board’s level of support.

Any insights or suggestions on how to proceed with determining the cause of 
this issue would be greatly appreciated.

Thanks,
Stephen

On 17 Mar 2022, at 23:30, WILLIAMS Stephen 
<[email protected]<mailto:[email protected]>> wrote:

Thanks for the information.

The issue is being observed within a single threaded camkes component, 
therefore I’m assuming that thread safety of the muslc functions it not 
relevant here.

In an attempt to investigate memory corruption I have run the software under 
valgrind on my host machine, which does not highlight any issues. Obviously the 
interaction with the actual devices was needed to be simulated to perform this 
run on the host however I believe that all of the same memory allocation and 
access routines should have been performed. This suggests that memory 
corruption may not be the cause.

One potentially interesting observation is that if I replace all calls to 
memalign with the equivalent calls to malloc (noting that the alignment appears 
to be for performance rather than functional correctness reasons) I no longer 
have overlapping memory regions allocated and the code functions as expected.

Thanks,
Stephen

On 16 Mar 2022, at 23:23, Kent Mcleod 
<[email protected]<mailto:[email protected]>> wrote:

***This mail has been sent by an external source***

On Thu, Mar 17, 2022 at 8:26 AM WILLIAMS Stephen via Devel
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I’m currently working on a project porting drivers from U-Boot to seL4 and have 
run into an unexpected problem seemingly triggered by use of the memalign 
within the U-Boot drivers.

What I am seeing is that calls to memalign from within my CAmkES component can 
return pointers to regions which overlap with those previously returned by 
malloc. Obviously this leads to the two allocated regions trampling over each 
other and resulting corruption of data.

I’m at a complete loss to explain this behaviour and would be very grateful to 
receive any suggestions or pointers.


Both malloc and memalign in camkes are provided by our fork of
libmuslc (https://github.com/sel4/musllibc/). Internally, memalign
calls malloc and so it seems like your issue can be reduced to
multiple calls to malloc are returning overlapping regions. This could
be for a couple reasons:
- Within the default camkes runtime, muslc functions such as malloc
aren't thread safe and so must be called from critical sections
guarded by a lock to avoid races. Many camkes components use a global
lock when performing operations that mutate state:
https://github.com/seL4/global-components/blob/master/components/TimeServer/src/time_server.c#L152,
or they don't use dynamic memory allocation after initialization (as
initialization is single threaded). This lack of thread safety is a
bit nasty and the runtime should do more to protect developers from
this, but currently I don't think it does.
- You have memory corruption somewhere else that's causing malloc's
bookkeeping structures to be corrupted.



Thanks for your help,
Stephen
This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.
_______________________________________________
Devel mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>


_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to