GCC4.8.5)

Darrel J. Conway Tue, 30 Aug 2016 09:06:17 -0700

I don't know how to answer that question directly, so let me explain myprocedure for running the test system. When I run the test system, theprocess goes like this:


 * Run the system (27 hrs for this; the rest takes about 2 days in the
   R2016a release cycle)
 * Collect results
 * Check the passing percentage and number of tests run. This is where
   the issue of test system changes first hits.
     o When the percentage drops, that is a flag for me to investigate
       why.
     o The percentage passed is the number that external folks (the
       ones that read my weekly and monthly) are watching, so I end up
       writing explanations about why a passing percentage drops when
       the system is getting closer to release.  I really don't like it
       when that starts sounding like making excuses -- "it's the test
       system's fault."  Sigh.
 * Difference the current report with the previous one, looking at
   everything reported in the "All known issues" list.
 * Run every new script* that reports an issue, examine the output, and
   determine why the script failed.
     o * Truth is, I don't run every new script if I can track down a
       systematic explanation for the failure -- e.g. 3 scripts with
       nearly identical names, reporting a failure at the same line
       number ("#24 failed"), when another set of 3 nearly identically
       named scripts failed with the same reported failure.  Once a
       pattern is clear, I tend to start spot checking.  Or get cross eyed.
 * Run every script that shows a change in the failure report. For
   example, if "#27 failed" changes to "#5 failed", I look to see why.
 * Write up the changes (that is, collect the changes in a spreadsheet
   for later comparisons).
     o This piece includes noting the total number of scripts.  I
       expect that this comparison holds:


           total issues (now) <= total issues (previous)  +  number of
       scripts (now) - number of scripts (previous)

     o I also expect the number of new failures reported to match the
       count of the number of scripts reporting failures, adjusted for
       the number of scripts that changed from failing to passing  (Not
       sure this is clear -- I check to see if the number of new
       failures tracks with the number of new lines in the list of
       failures, adjusted for the number that moved from failed to
       passed.  If not, I look for the missing scripts -- what is not
       in either list?)
     o In other words, I expect internal consistency in the data
       reports. (Deleting scripts breaks this piece, incidentally. It's
       better, this late in the cycle, to move them to the deferred list.)

I don't look at the report below the "New Issues (Deferred)" entry,except to see 2 things:


1. Were are changes reported there (my diff tool color codes changes,
   so they are easy to spot)
2. Did a script failure reported on a previous run, not in the fixed
   list, move to the deferred list?

The time consuming piece of this process is the part where I dig intoand run the scripts that report failures in order to understand thenature of the failure.

When (as was mostly the case for R2015a) the test system is (mostly)frozen at QA complete, it is easy to track the test run changes as weapproach release. When the test system is changing from run to run(scripts added and deleted, truth data changing from run to run) theprocess becomes much less manageable, and makes the test analysisextended in time (2 days vs a couple of hours).

The plan to make the test system cross-platform robust should help withsome of this -- we'll drop from 100+ failures on Linux to something moremanageable. (Of course, these are mostly false failures, but they haveto be at least spot checked to make sure we didn't introduce somethingfor Linux not seen on Windows).

That won't particularly address the timing of system changes breakingthe statistics as release approaches. That's why we need tests to be inplace at QA complete, so that we have a valid metric for progresstowards release. (The accelerated schedule for this release -- outsideof our control -- is something that hasn't helped on that part for R2016a.)


- DJC

On 08/30/2016 08:03 AM, Cooley, D S. (GSFC-5950) wrote:

Darrel,

Would it be OK for SES to add "Deferred" tests (temporary) if he needed to?

I don't think this affects the statistics.

Steve

-----Original Message-----
From: Darrel J. Conway [mailto:[email protected]]
Sent: Tuesday, August 30, 2016 10:55 AM
To: Slojkowski, Steven E. (GSFC-595.0)[OMITRON] <[email protected]>; 
[email protected]
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25 
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)

Thanks.  Once we have the leap second issue in place (and anything else CCB 
says to wait for tomorrow), I'll start a RHEL test run.

- DJC


On 08/30/2016 05:41 AM, Slojkowski, Steven E. (GSFC-595.0)[OMITRON] wrote:

I added test cases for the tests discussed on Monday. Some work, others are 
just templates. I will be working on getting these scripts into running shape, 
but I won't add any new ones (at least not this week).

-----Original Message-----
From: Darrel J. Conway [mailto:[email protected]]
Sent: Saturday, August 27, 2016 9:58 PM
To: [email protected]
Cc: Slojkowski, Steven E. (GSFC-595.0)[OMITRON]
Subject: Re: [Gmat-buildtest] Test results: 2016-08-25
(Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)

Test results on the RC2 candidate.  98.91% (another drop), and the reported 
failures all look like they are test scripts committed since the 8/25 run.  
I'll go though the details on Monday.  I suspect they are largely platform 
differences producing false failures.

Expect more whining Monday.  Not sure I have the energy for it on the
weekend!  :-/

- DJC


--
Darrel J. Conway, Ph.D.      Thinking Systems, Inc.
Senior Scientist and CEO     6441 N Camino Libby
Phone: (623) 298-4530        Tucson, AZ 85718
FAX:   (520) 232-2533        www.thinksysinc.com
Cell:  (520) 425-3626        [email protected]

------------------------------------------------------------------------------

_______________________________________________
Gmat-buildtest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gmat-buildtest

Re: [Gmat-buildtest] Test results: 2016-08-25 (Catsclaw/RHEL-Linux/GmatConsole-64/M2016a/GCC4.8.5)

Reply via email to