Re: [DISCUSS - NuttX Workflow]

David Sidrane Fri, 20 Dec 2019 03:05:39 -0800

Hi Nathan,

On 2019/12/20 02:51:56, Nathan Hartman <hartman.nat...@gmail.com> wrote: 
> On Thu, Dec 19, 2019 at 6:24 PM Gregory Nutt <spudan...@gmail.com> wrote:
> > >> ] A bad build system change can cause serious problems for a lot of
> people around the world.  A bad change in the core OS can destroy the good
> reputation of the OS.
> > > Why is this the case? Users should not be using unreleased code or be
> encouraged to use it.. If they are one solution is to make more frequent
> releases.
> > I don't think that the number of releases is the factor.  It is time in
> > people's hand.  Subtle corruption of OS real time behavior is not easily
> > testing.   You normally have to specially instrument the software and
> > setup a special test environment perhaps with a logic analyzer to detect
> > these errors.  Errors in the core OS can persists for months and in at
> > least one case I am aware of, years, until some sets up the correct
> > instrumented test.
> 
> And:
> 
> On Thu, Dec 19, 2019 at 4:20 PM Justin Mclean <jus...@classsoftware.com>
> wrote:
> > > ] A bad build system change can cause serious problems for a lot of
> people around the world.  A bad change in the core OS can destroy the good
> reputation of the OS.
> >
> > Why is this the case? Users should not be using unreleased code or be
> encouraged to use it.. If they are one solution is to make more frequent
> releases.
> 
> Many users are only using released code. However, whatever is in "master"
> eventually gets released. So if problems creep in unnoticed, downstream
> users will be affected. It is only delayed.
> 
> I can personally attest that those kinds of errors are extremely difficult
> to detect and trace. It does require a special setup with logic analyzer or
> oscilloscope, and sometimes other tools, not to mention a whole setup to
> produce the right stimuli, several pieces of software that may have to be
> written specifically for the test....
> 
> I have been wracking my brain on and off thinking about how we could set up
> an automated test system to find errors related to timing etc.
> Unfortunately unlike ordinary software for which you can write an automated
> test suite, this sort of embedded RTOS will need specialized hardware to
> conduct the tests. That's a subject for another thread and i don't know if
> now is the time, but I will post my thoughts eventually.
> 
> Nathan
>


>From the proposal

"Community

NuttX has a large, active community.  Communication is via a Google group at 
https://groups.google.com/forum/#!forum/nuttx where there are 395 members as of 
this writing.  Code is currently maintained at Bitbucket.org at 
https://bitbucket.org/nuttx/.  Other communications are through Bitbucket 
issues and also via Slack for focused, interactive discussions."


> Many users are only using released code.

Can we ask the 395 members?

I can only share my experience with NuttX since I began working on the project 
in 2012 for multiple companies.

Historically (based on my time on the project) releases - were build tested - 
by this I mean that the configurations were updated and the thus created a set 
of "Build Test vectors" BTV. Given the number of permutations solely based on 
the load time of (http://nuttx.org/doku.php?id=documentation:configvars) with 
95,338 CONFIG_* hits. Yes there are duplicates on the page and dependencies. 
This is just meant to give a number of bits.... 

The total space is very large

The BTV space was very sparse coverage. 

IIRC Greg gave the build testing task a day of time. It was repeated after 
errors were found.  I am not aware of any other testing. Are you?

There were no Release Candidate (rc) nor alpha nor beta test that ran this code 
one real systems and very little, if any Run Test Vectors (RTV) - I have never 
seen a test report - has anyone?

One way to look at this is Sporadic Integration. (SI) with limited BTV and 
minimal RTV.  Total Test Vector Coverage TTVC = BTV + RTV;  The ROI of  way of 
working, from a reliability perspective was and is very small.  

A herculean effort Greg's part with little return: We released code with many 
significant and critical errors in it. See the ReleaseNotes and the commit log.

Over the years Greg referred to TRUNK (yes it was on SVN) and master as his 
"own sandbox" stating is should not be considered stable or build-able. This is 
evident in the commit log. 

I have personally never used a release from a tarball. Given the above why 
would I? It is less stable then master at TC = N 
(https://www.electronics-tutorials.ws/rc/rc_1.html) where N Is some number of 
days after a release. - unfortunately based on the current practices (a very 
unprofessional workflow)  N is also dictated by when apps and nuttx actually 
building for a given target's set of BTV.

With the tools and resources that exist in our work today, Quite frankly: This 
unacceptable and is an embarrassment. 

I suspect this is why there is a Tizen. The modern era - gets it. (Disclaimer I 
am an old dog - I am learning to get it)

--- Disclaimer ---

In the following, I'm am not bragging about PX4 or selling tools, I am merely 
trying to share our experiences for the betterment of NuttX.
 
>From what I understand PX4 has the most instances of NuttX running on real HW 
>in the world. Over 300K. (I welcome other users to share their numbers)

PX4's Total TTVC is still limited, but much, much greater than NuttX. 

We use Continuous integration (CI) on Nuttx on PX4 on every commit on PRs.

        C/C++ CI / build (push) Successful in 3m
        Compile MacOS Pending — This commit is being built
        Compile All Boards — This commit looks good
        Hardware Test — This commit looks good
        SITL Tests — This commit looks good
        SITL Tests (code coverage) — This commit looks good
        ci/circleci — Your tests passed on CircleCI!
        continuous-integration/appveyor/pr — AppVeyor build succeeded
        continuous-integration/jenkins/pr-head — This commit looks good


We run tests on HW.

http://ci.px4.io:8080/blue/organizations/jenkins/PX4_misc%2FFirmware-hardware/detail/pr-mag-str-preflt/1/pipeline

I say limited because of the set of arch we use and the way we configure the OS.

I believe this to be true of all users. 

The benefit of a community is that the sum of all TTVC that finds the problems 
and fix them.  

Why not maximize TTVC - if it will have a huge ROI and it is free:

PX4 will contribute all that we have. We just need to build temporally 
consistent build. Yeah he is on the submodule thing AGAIN :) 


David

Re: [DISCUSS - NuttX Workflow]

Reply via email to