I don't know how to answer that question directly, so let me explain my
procedure for running the test system. When I run the test system, the
process goes like this:
* Run the system (27 hrs for this; the rest takes about 2 days in the
R2016a release cycle)
* Collect results
* Check the passing percentage and number of tests run. This is where
the issue of test system changes first hits.
o When the percentage drops, that is a flag for me to investigate
why.
o The percentage passed is the number that external folks (the
ones that read my weekly and monthly) are watching, so I end up
writing explanations about why a passing percentage drops when
the system is getting closer to release. I really don't like it
when that starts sounding like making excuses -- "it's the test
system's fault." Sigh.
* Difference the current report with the previous one, looking at
everything reported in the "All known issues" list.
* Run every new script* that reports an issue, examine the output, and
determine why the script failed.
o * Truth is, I don't run every new script if I can track down a
systematic explanation for the failure -- e.g. 3 scripts with
nearly identical names, reporting a failure at the same line
number ("#24 failed"), when another set of 3 nearly identically
named scripts failed with the same reported failure. Once a
pattern is clear, I tend to start spot checking. Or get cross eyed.
* Run every script that shows a change in the failure report. For
example, if "#27 failed" changes to "#5 failed", I look to see why.
* Write up the changes (that is, collect the changes in a spreadsheet
for later comparisons).
o This piece includes noting the total number of scripts. I
expect that this comparison holds:
total issues (now) <= total issues (previous) + number of
scripts (now) - number of scripts (previous)
o I also expect the number of new failures reported to match the
count of the number of scripts reporting failures, adjusted for
the number of scripts that changed from failing to passing (Not
sure this is clear -- I check to see if the number of new
failures tracks with the number of new lines in the list of
failures, adjusted for the number that moved from failed to
passed. If not, I look for the missing scripts -- what is not
in either list?)
o In other words, I expect internal consistency in the data
reports. (Deleting scripts breaks this piece, incidentally. It's
better, this late in the cycle, to move them to the deferred list.)
I don't look at the report below the "New Issues (Deferred)" entry,
except to see 2 things:
1. Were are changes reported there (my diff tool color codes changes,
so they are easy to spot)
2. Did a script failure reported on a previous run, not in the fixed
list, move to the deferred list?
The time consuming piece of this process is the part where I dig into
and run the scripts that report failures in order to understand the
nature of the failure.
When (as was mostly the case for R2015a) the test system is (mostly)
frozen at QA complete, it is easy to track the test run changes as we
approach release. When the test system is changing from run to run
(scripts added and deleted, truth data changing from run to run) the
process becomes much less manageable, and makes the test analysis
extended in time (2 days vs a couple of hours).
The plan to make the test system cross-platform robust should help with
some of this -- we'll drop from 100+ failures on Linux to something more
manageable. (Of course, these are mostly false failures, but they have
to be at least spot checked to make sure we didn't introduce something
for Linux not seen on Windows).
That won't particularly address the timing of system changes breaking
the statistics as release approaches. That's why we need tests to be in
place at QA complete, so that we have a valid metric for progress
towards release. (The accelerated schedule for this release -- outside
of our control -- is something that hasn't helped on that part for R2016a.)
- DJC
On 08/30/2016 08:03 AM, Cooley, D S. (GSFC-5950) wrote:
Darrel,
Would it be OK for SES to add "Deferred" tests (temporary) if he needed to?
I don't think this affects the statistics.
Steve
-----Original Message-----
From: Darrel J. Conway [mailto:djci...@gmail.com]
Sent: Tuesday, August 30, 2016 10:55 AM
To: Slojkowski, Steven E. (GSFC-595.0)[OMITRON] <steven.e.slojkow...@nasa.gov>;
gmat-buildtest@lists.sourceforge.net
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)
Thanks. Once we have the leap second issue in place (and anything else CCB
says to wait for tomorrow), I'll start a RHEL test run.
- DJC
On 08/30/2016 05:41 AM, Slojkowski, Steven E. (GSFC-595.0)[OMITRON] wrote:
I added test cases for the tests discussed on Monday. Some work, others are
just templates. I will be working on getting these scripts into running shape,
but I won't add any new ones (at least not this week).
-----Original Message-----
From: Darrel J. Conway [mailto:djci...@gmail.com]
Sent: Saturday, August 27, 2016 9:58 PM
To: gmat-buildtest@lists.sourceforge.net
Cc: Slojkowski, Steven E. (GSFC-595.0)[OMITRON]
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)
Test results on the RC2 candidate. 98.91% (another drop), and the reported
failures all look like they are test scripts committed since the 8/25 run.
I'll go though the details on Monday. I suspect they are largely platform
differences producing false failures.
Expect more whining Monday. Not sure I have the energy for it on the
weekend! :-/
- DJC
--
Darrel J. Conway, Ph.D. Thinking Systems, Inc.
Senior Scientist and CEO 6441 N Camino Libby
Phone: (623) 298-4530 Tucson, AZ 85718
FAX: (520) 232-2533 www.thinksysinc.com
Cell: (520) 425-3626 d...@thinksysinc.com
------------------------------------------------------------------------------
_______________________________________________
Gmat-buildtest mailing list
Gmat-buildtest@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gmat-buildtest