Re: [RFC] How to improve NuttX quality and reliability

Alan C. Assis Mon, 28 Apr 2025 09:42:43 -0700

I just create an Issue listing the actions:

Actions list to improve NuttX quality and reliability
https://github.com/apache/nuttx/issues/16278


I don't know if this is the right place, but at least I broke down the
actions and it could be easier to update it individually.

BR,

Alan

On Sun, Apr 27, 2025 at 4:09 PM Nathan Hartman <hartman.nat...@gmail.com>
wrote:

> I like all of these ideas and would like to add:
> * static analysis can find simple mistakes that might have been introduced.
> Things like a function that forgot to return a value in some code path, or
> use of uninitialized variable, can be caught by static analysis.
>
> By the way, did some recent change increase stack usage? If stacks are
> overflowing, you will get all kinds of weird behaviors. Maybe git bisect
> since a month or two ago and run some tests (like run ostest a few times
> for each commit being tested), see what comes up?
>
> Nathan
>
> On Sun, Apr 27, 2025 at 8:33 AM Alan C. Assis <acas...@gmail.com> wrote:
>
> > Dear NuttXers,
> >
> > In the last weeks we are seeing some degradation of NuttX reliability as
> > some users have reported.
> >
> > We saw it happening yesterday during our Live video: the fb command
> behaved
> > in some very strange ways:
> > https://www.youtube.com/watch?v=pbq3suU3g5g&t=1740s
> >
> > First it printed all the rectangles with pausing between them, then in
> the
> > next test it didn't work and after some time the board started.
> >
> > If you go back the video will notice that the "uname" command also
> detailed
> > a lot to show the results. That was not expected, NuttX is really fast.
> >
> > We have already proposed creating automated tests to help to improve
> NuttX,
> > but it alone is not enough. Some features cannot be tested easily by
> > automated tests.
> > I.E. The audio tone was broken by a commit around 2020 or early, as we
> only
> > noticed it last year when someone tried to use it.
> >
> > So these are some suggestions that we could try to help our project:
> >
> > 1) Automated Test and CI Integration (only will cover some corner cases)
> > It will help to detect for example if the board is not starting and if
> some
> > testings (ostest, etc) is passing.
> >
> > 2) Test Coverage Metrics
> > Integrate code coverage tools like gcov/lcov for unit tests, dhrystone,
> > coremark, etc
> > Display and track code coverage over time to identify untested parts of
> the
> > kernel, drivers, and libraries.
> >
> > 3) Expand and Improve Documentation
> > Improve Documentation/ to let end users to test boards easily.
> > All boards should have basic instructions explaining how to install NuttX
> > on it, currently almost none board has this basic instruction: i.e.
> >
> >
> https://nuttx.apache.org/docs/latest/platforms/arm/stm32f4/boards/stm32f4discovery/index.html
> > We should enhance board-specific installation guides:
> >     How to connect the board (serial, JTAG, SWD).
> >     How to flash NuttX (dfu-util, OpenOCD, vendor tools, etc.).
> >     How to configure a simple project (make menuconfig, selecting board
> > options).
> > Add "Getting Started" tutorials for total beginners.
> > Add troubleshooting sections for common problems.
> >
> > 4) Standardize Board Port Quality
> > Create a checklist for each board port to ensure minimum quality:
> >         Does ostest pass?
> >         Do basic drivers (UART, Timer, GPIO) work?
> >         Is SMP tested (if applicable)?
> > Boards that don't meet the minimum criteria are marked as "experimental"
> or
> > "unsupported".
> >
> > 5) Better Unit Testing and Mocking
> > Expand the apps/testing suite with more unit tests.
> > Use frameworks like CMocka or extend the existing ostest usage.
> > Mock drivers and hardware to allow kernel logic testing even without
> > hardware.
> >
> > 6) Stable API Guarantees
> > Formalize API stability between releases (similar to "Stable API" policy
> in
> > Linux kernel).
> > Document which APIs are considered stable and which are still
> experimental.
> > Add a deprecation process for removing/renaming public APIs.
> >
> > 7) Regression Testing
> > Maintain a regression test suite to ensure that previously fixed bugs do
> > not come back.
> > Basically when someone found an issue they should create a test to be
> > integrated into ostest to detect it in the future.
> > Set up automatic re-run of regression tests in CI when code is merged.
> >
> > 8) Others Performance Benchmarks Improvements
> > Create standard performance tests:
> > Boot time benchmarks
> > Context switch time
> > Interrupt latency
> > Track performance regressions automatically in CI.
> >
> > 9) Create Documentation/Templates to be used as reference for boards and
> > other common documentation
> >
> > Other idea that we could implement to validate that all most important
> > peripherals of all arch are working as expected: create a base board
> > (mainboard) with many important peripherals (sensors, audio, ethernet,
> usb)
> > and a "cartridge" board to be connected to it (we could use some existing
> > standard like Raspberry Pi Computer Module CM4s SODIMM:
> > https://datasheets.raspberrypi.com/cm4s/cm4s-datasheet.pdf or Sparkfun
> > MicroMod https://www.sparkfun.com/micromod). The good thing about using
> > MicroMod is that there are already a lot of microcontroller "cartridge"
> > boards that we could use.
> >
> > Please let me know what you guy think and we could plan the actions to
> make
> > it happen!
> >
> > BR,
> >
> > Alan
> >
>

Re: [RFC] How to improve NuttX quality and reliability

Reply via email to