Re: [RFC] How to improve NuttX quality and reliability

Sebastien Lorquet Tue, 29 Apr 2025 09:33:27 -0700

Hello,

I have already suggested that before and I insist this is a seriousproposal:

* Reduce the number of commits that enter nuttx upstream main brancheach day.

There are several methods to do that, associated with different statesof comfort and different technical solutions.


Sebastien


On 28/04/2025 18:41, Alan C. Assis wrote:

I just create an Issue listing the actions:

Actions list to improve NuttX quality and reliability
https://github.com/apache/nuttx/issues/16278

I don't know if this is the right place, but at least I broke down the
actions and it could be easier to update it individually.

BR,

Alan

On Sun, Apr 27, 2025 at 4:09 PM Nathan Hartman <hartman.nat...@gmail.com>
wrote:

I like all of these ideas and would like to add:
* static analysis can find simple mistakes that might have been introduced.
Things like a function that forgot to return a value in some code path, or
use of uninitialized variable, can be caught by static analysis.

By the way, did some recent change increase stack usage? If stacks are
overflowing, you will get all kinds of weird behaviors. Maybe git bisect
since a month or two ago and run some tests (like run ostest a few times
for each commit being tested), see what comes up?

Nathan

On Sun, Apr 27, 2025 at 8:33 AM Alan C. Assis <acas...@gmail.com> wrote:

Dear NuttXers,

In the last weeks we are seeing some degradation of NuttX reliability as
some users have reported.

We saw it happening yesterday during our Live video: the fb command

behaved

in some very strange ways:
https://www.youtube.com/watch?v=pbq3suU3g5g&t=1740s

First it printed all the rectangles with pausing between them, then in

the

next test it didn't work and after some time the board started.

If you go back the video will notice that the "uname" command also

detailed

a lot to show the results. That was not expected, NuttX is really fast.

We have already proposed creating automated tests to help to improve

NuttX,

but it alone is not enough. Some features cannot be tested easily by
automated tests.
I.E. The audio tone was broken by a commit around 2020 or early, as we

only

noticed it last year when someone tried to use it.

So these are some suggestions that we could try to help our project:

1) Automated Test and CI Integration (only will cover some corner cases)
It will help to detect for example if the board is not starting and if

some

testings (ostest, etc) is passing.

2) Test Coverage Metrics
Integrate code coverage tools like gcov/lcov for unit tests, dhrystone,
coremark, etc
Display and track code coverage over time to identify untested parts of

the

kernel, drivers, and libraries.

3) Expand and Improve Documentation
Improve Documentation/ to let end users to test boards easily.
All boards should have basic instructions explaining how to install NuttX
on it, currently almost none board has this basic instruction: i.e.

https://nuttx.apache.org/docs/latest/platforms/arm/stm32f4/boards/stm32f4discovery/index.html

We should enhance board-specific installation guides:
     How to connect the board (serial, JTAG, SWD).
     How to flash NuttX (dfu-util, OpenOCD, vendor tools, etc.).
     How to configure a simple project (make menuconfig, selecting board
options).
Add "Getting Started" tutorials for total beginners.
Add troubleshooting sections for common problems.

4) Standardize Board Port Quality
Create a checklist for each board port to ensure minimum quality:
         Does ostest pass?
         Do basic drivers (UART, Timer, GPIO) work?
         Is SMP tested (if applicable)?
Boards that don't meet the minimum criteria are marked as "experimental"

or

"unsupported".

5) Better Unit Testing and Mocking
Expand the apps/testing suite with more unit tests.
Use frameworks like CMocka or extend the existing ostest usage.
Mock drivers and hardware to allow kernel logic testing even without
hardware.

6) Stable API Guarantees
Formalize API stability between releases (similar to "Stable API" policy

in

Linux kernel).
Document which APIs are considered stable and which are still

experimental.

Add a deprecation process for removing/renaming public APIs.

7) Regression Testing
Maintain a regression test suite to ensure that previously fixed bugs do
not come back.
Basically when someone found an issue they should create a test to be
integrated into ostest to detect it in the future.
Set up automatic re-run of regression tests in CI when code is merged.

8) Others Performance Benchmarks Improvements
Create standard performance tests:
Boot time benchmarks
Context switch time
Interrupt latency
Track performance regressions automatically in CI.

9) Create Documentation/Templates to be used as reference for boards and
other common documentation

Other idea that we could implement to validate that all most important
peripherals of all arch are working as expected: create a base board
(mainboard) with many important peripherals (sensors, audio, ethernet,

usb)

and a "cartridge" board to be connected to it (we could use some existing
standard like Raspberry Pi Computer Module CM4s SODIMM:
https://datasheets.raspberrypi.com/cm4s/cm4s-datasheet.pdf or Sparkfun
MicroMod https://www.sparkfun.com/micromod). The good thing about using
MicroMod is that there are already a lot of microcontroller "cartridge"
boards that we could use.

Please let me know what you guy think and we could plan the actions to

make

it happen!

BR,

Alan

Re: [RFC] How to improve NuttX quality and reliability

Reply via email to