On 8/19/2021 18:03, Chris Johns wrote:
On 20/8/21 4:55 am, Kinsey Moore wrote:
On 8/19/2021 13:32, Gedare Bloom wrote:
On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore <kinsey.mo...@oarcorp.com> wrote:
I've seen these failures on my local system, in our CI, and on a build
server that I sometimes
use for development/testing so if it's a configuration issue we're being
pretty consistent about
misconfiguration across some pretty different environments (docker,
bare-metal, VM, different
OSs, different QEMU versions). I've seen enough of the spintrcritical
tests fail sporadically on
QEMU to lump them all into this category. These are also tests that I
have seen behave badly
on ARMv7 QEMU on my local system (which doesn't rule out
misconfiguration, but it's another
data point).

Yes, for example, it may be a matter of qemu process counts spawned by
rtems-test, and the order in which tests get invoked could be a cause
for which ones don't work. I could easily see this happening, since
each test runtime will be fairly consistent, so you'll often see the
same tests running concurrently with each other. But, if you change
the order (e.g., by adding new tests), then we may see a new set of
sporadically failing testcases, will we just add those, or do we need
to re-examine this indetermine set periodically? Who will maintain
this list? That's kind of the root of my concern here.
I understand your concern about maintenance of the failure list and I don't
have a good answer for you. I imagine going forward it would be a combination
of the current stake-holders for a given BSP and anyone who watches the
automated build output from Joel's runs for these kinds of issues.

On the other hand if we don't mark those tests, people will get fatigued
looking at the spurious failures and assume any new ones just fall into the
same category as others. At that point is it even worth running the
automated tests for that platform?

As far as your worry about marking these indeterminate, they're only
being marked as such for
QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
and runs all these tests flawlessly.
Great, this is important.

These failures become much more common when there is otherwise load on
the system and a
lot of them disappear when you limit the tester to a single QEMU
instance at a time.

I'm wondering if we should sacrifice testing speed for
coverage/quality. If throttling rtems-test leads to more reliable test
results, then it may be a better option than basically ignoring a
swath of our testsuite.
That would certainly mitigate some of the failures, but you'd also have to
guarantee nothing else is running on the system which could cause the same
problem. I know at least some of the current automated runs operate on a
shared system which can and does often have other intensive processes
running on it. There are also the tests that are sporadic on QEMU even
without additional load.
What is it in these tests when combined with qemu that causes the tests to fail?
Is there some relation to a real clock, some shared host resource or a bug in
qemu? I am concerned a simulator can vary like this based on the host's load and
it makes me wonder how people use it on machines to host a number VMs.
I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed that this
was a known/accepted problem with QEMU when the same issues popped up on
AArch64. My local system under no other load produces these failures for the Zynq A9 QEMU
BSP:

        "failed": [
            "spcpucounter01.exe",
            "psxtimes01.exe",
            "sp69.exe",
            "psx12.exe",
            "minimum.exe",
            "dl06.exe",
            "sptimecounter02.exe"
        ],

minimum.exe and dl06.exe are probably unrelated, but the remainder are in my
problem set for AArch64 on QEMU.

A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
conditions with all the test carve-outs removed:

        "failed": [
            "psx12.exe",
            "spcpucounter01.exe",
            "sptimecounter01.exe",
            "sptimecounter02.exe",
            "sp04.exe"
        ],

Because of my experience with the aforementioned ARMv7 BSP and the lack of
failures on hardware, I chose not to weed out the root cause of the failures under
QEMU. This patch is documentation of our observations across multiple
architectures and BSPs running on QEMU more than anything else.
I feel with this volume of tests being tagged this way we should have a better
understanding of the problem and so a means to track or not track how to resolve
it. As Gedare has kindly stated once pushed this change disappears into a dark
corner and we have no means to track it.

The other solution is to set `jobs` to `1` in this BSP's tester config, again
something Gedare has raised. It means we get better or even valid results. What
is more important, valid results or running the testsuite as fast as possible?
I fully support dropping the number of jobs to "half" or 1 for better results on QEMU runs that display these problems. My comment in that regard was that other system loading (or multiple simultaneous test runs) can also cause the same problem and so this is only a partial solution. Barring a fix for RTEMS or QEMU for these load- dependent and sporadic failures, this at least still needs to be documented in some
form.


Kinsey

_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to