I don't know how to answer that question directly, so let me explain my procedure for running the test system. When I run the test system, the process goes like this:

 * Run the system (27 hrs for this; the rest takes about 2 days in the
   R2016a release cycle)
 * Collect results
 * Check the passing percentage and number of tests run. This is where
   the issue of test system changes first hits.
     o When the percentage drops, that is a flag for me to investigate
       why.
     o The percentage passed is the number that external folks (the
       ones that read my weekly and monthly) are watching, so I end up
       writing explanations about why a passing percentage drops when
       the system is getting closer to release.  I really don't like it
       when that starts sounding like making excuses -- "it's the test
       system's fault."  Sigh.
 * Difference the current report with the previous one, looking at
   everything reported in the "All known issues" list.
 * Run every new script* that reports an issue, examine the output, and
   determine why the script failed.
     o * Truth is, I don't run every new script if I can track down a
       systematic explanation for the failure -- e.g. 3 scripts with
       nearly identical names, reporting a failure at the same line
       number ("#24 failed"), when another set of 3 nearly identically
       named scripts failed with the same reported failure.  Once a
       pattern is clear, I tend to start spot checking.  Or get cross eyed.
 * Run every script that shows a change in the failure report. For
   example, if "#27 failed" changes to "#5 failed", I look to see why.
 * Write up the changes (that is, collect the changes in a spreadsheet
   for later comparisons).
     o This piece includes noting the total number of scripts.  I
       expect that this comparison holds:

           total issues (now) <= total issues (previous)  +  number of
       scripts (now) - number of scripts (previous)

     o I also expect the number of new failures reported to match the
       count of the number of scripts reporting failures, adjusted for
       the number of scripts that changed from failing to passing  (Not
       sure this is clear -- I check to see if the number of new
       failures tracks with the number of new lines in the list of
       failures, adjusted for the number that moved from failed to
       passed.  If not, I look for the missing scripts -- what is not
       in either list?)
     o In other words, I expect internal consistency in the data
       reports. (Deleting scripts breaks this piece, incidentally. It's
       better, this late in the cycle, to move them to the deferred list.)

I don't look at the report below the "New Issues (Deferred)" entry, except to see 2 things:

1. Were are changes reported there (my diff tool color codes changes,
   so they are easy to spot)
2. Did a script failure reported on a previous run, not in the fixed
   list, move to the deferred list?

The time consuming piece of this process is the part where I dig into and run the scripts that report failures in order to understand the nature of the failure.

When (as was mostly the case for R2015a) the test system is (mostly) frozen at QA complete, it is easy to track the test run changes as we approach release. When the test system is changing from run to run (scripts added and deleted, truth data changing from run to run) the process becomes much less manageable, and makes the test analysis extended in time (2 days vs a couple of hours).

The plan to make the test system cross-platform robust should help with some of this -- we'll drop from 100+ failures on Linux to something more manageable. (Of course, these are mostly false failures, but they have to be at least spot checked to make sure we didn't introduce something for Linux not seen on Windows).

That won't particularly address the timing of system changes breaking the statistics as release approaches. That's why we need tests to be in place at QA complete, so that we have a valid metric for progress towards release. (The accelerated schedule for this release -- outside of our control -- is something that hasn't helped on that part for R2016a.)

- DJC

On 08/30/2016 08:03 AM, Cooley, D S. (GSFC-5950) wrote:
Darrel,

Would it be OK for SES to add "Deferred" tests (temporary) if he needed to?

I don't think this affects the statistics.

Steve

-----Original Message-----
From: Darrel J. Conway [mailto:djci...@gmail.com]
Sent: Tuesday, August 30, 2016 10:55 AM
To: Slojkowski, Steven E. (GSFC-595.0)[OMITRON] <steven.e.slojkow...@nasa.gov>; 
gmat-buildtest@lists.sourceforge.net
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25 
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)

Thanks.  Once we have the leap second issue in place (and anything else CCB 
says to wait for tomorrow), I'll start a RHEL test run.

- DJC


On 08/30/2016 05:41 AM, Slojkowski, Steven E. (GSFC-595.0)[OMITRON] wrote:
I added test cases for the tests discussed on Monday. Some work, others are 
just templates. I will be working on getting these scripts into running shape, 
but I won't add any new ones (at least not this week).

-----Original Message-----
From: Darrel J. Conway [mailto:djci...@gmail.com]
Sent: Saturday, August 27, 2016 9:58 PM
To: gmat-buildtest@lists.sourceforge.net
Cc: Slojkowski, Steven E. (GSFC-595.0)[OMITRON]
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)

Test results on the RC2 candidate.  98.91% (another drop), and the reported 
failures all look like they are test scripts committed since the 8/25 run.  
I'll go though the details on Monday.  I suspect they are largely platform 
differences producing false failures.

Expect more whining Monday.  Not sure I have the energy for it on the
weekend!  :-/

- DJC


--
Darrel J. Conway, Ph.D.      Thinking Systems, Inc.
Senior Scientist and CEO     6441 N Camino Libby
Phone: (623) 298-4530        Tucson, AZ 85718
FAX:   (520) 232-2533        www.thinksysinc.com
Cell:  (520) 425-3626        d...@thinksysinc.com

------------------------------------------------------------------------------
_______________________________________________
Gmat-buildtest mailing list
Gmat-buildtest@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gmat-buildtest

Reply via email to