Hi, We are trying to enable SMP for a dual core ARM Cortex-R8. We are using a kernel based on NuttX-7.31, with added proprietary support for our embedded Cortex-R8 SOC with separate instruction and data caches. We modeled the R-core SMP implementation after the A-core implementation and the IMX6 as recent as February, 2020.
Our current method of stress testing is to enable our features, which consist of some threads (around 6 or so) and low priority work queue work, and running Ostest many times. We would like to run both I-cache and D-cache enabled, but we spend some time trying to enable coherent d-caches with the SCU with no luck, and noticed some stability issues called out in the Sabre6 readme. So we settled on I-cache only. However, even with just I-cache, we have been experiencing issues booting into the operating system with that enabled. It seems that when we enable more features to start on boot, we seem to get caught in the ldrex/strex assembly loop of the up_testset.S function in both cores. We cannot reproduce this with I-cache disabled. We added some DSB instructions before and after the lrdex/strex pair, and we don't see that issue anymore. Is there anything you would be aware of that would cause this issue? However, when we do this, we start hitting some system asserts related to critical sections and sched_lock/unlock commands during "stress testing". For example, we seem to hit an enter_critical_section assert where the irqlock on the tcb is 0, while g_cpu_irqset indicates the calling CPU has the irq lock. We see this happen recently when we call nxsem_post from a ramlog_read. DEBUGASSERT((g_cpu_irqset & (1 << cpu)) == 0); We have also hit system asserts during OS test during the signest test in the waiter_action function. During the sched_unlock after we increment the nest_count var, we see the assert where the lockcount of the tcb is 1, as expected, but the locks are not held by the CPU in question. DEBUGASSERT(g_cpu_schedlock == SP_LOCKED && (g_cpu_lockset & (1 << cpu)) != 0); We think the issue might be one of two things. that the changes to up_testset are somehow ruining the atomic access to the irq/sched_lock and corresponding set bits via the set_locks, or that we pulled architecture specific changes that require some generic OS code to be updated as well. Two questions: 1) What conditions have you seen that trigger those specific asserts that we should look out for? 2) Do any of the current working SMP implementations support I-cache? Thank you, Ryan