Re: [Lustre-discuss] [Discuss] coverage measurement at 2012 09 15

Andreas Dilger Tue, 02 Oct 2012 15:45:22 -0700

On 2012-10-02, at 2:19 PM, Cory Spitz wrote:
>> Are the percentages if code coverage getting better or worse?
> 
> I don't know exactly, but based on the information that Robert Read
> shared at LUG '09, sanity was netting "60-70% coverage of core Lustre
> modules" (http://wiki.lustre.org/images/4/4f/RobertReadTalk1.pdf).


I was wondering that also, but according to the original URL from Roman, the 
mechanism for measuring code coverage was changed in the recent runs, so I 
don't know if it is possible to do head-to-head comparisons.

>> I can definitely imagine that many error handling code paths (e.g. checking 
>> for allocation failures) would not be exercised without specific changes 
>> (see e.g. my unlanded patch to fix the OBD_ALLOC() failure injection code).
> 
> Cray has started looking at testing w/forced memory allocation failures
> from the Linux fault injection framework
> (http://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt).

I've seen this, but hadn't actually had time to look into it.  I'm happy to see 
you taking the initiative to try out this new avenue for testing.

Another related (though different) set of tests would be to run on a client or 
server booted with a smaller amount of RAM (say 512MB-1GB) and see what 
problems appear.  I suspect there are a lot of hash tables, constants, etc. and 
such that do not properly scale with RAM size.

> As we make progress we'll open tickets and push patches.  I expect to
> find problems ;)

Yes, no doubt.  It is probably worthwhile to check the CEA Coverity patches 
before submitting anything new, in case those failures are already fixed there.

It is probably also worthwhile to submit a patch that removes the equivalent 
fault-injection code from the Lustre code paths, since it is pure runtime 
overhead for every memory allocation at this point.

> Andreas, were you talking about http://review.whamcloud.com/#change,3037?  If 
> not, what ticket were you referring to?

Yes, that was it.  This patch has a few minor fixes that I found in my testing, 
and fixes the error messages, but there is no point in fixing the fault 
injection code anymore.

Cheers, Andreas

> On 09/29/2012 07:24 AM, Dilger, Andreas wrote:
>> Hi Roman,
>> The coverage data is interesting. It would be even more useful to be able to 
>> compare it to the previous code coverage run, if they used the same method 
>> for measuring coverage (the new report states that the method has changed 
>> and reduced coverage).
>> 
>> Are the percentages if code coverage getting better or worse?  Are there 
>> particular areas of the code that have poor coverage that could benefit from 
>> some focussed attention with new tests?
>> 
>> I can definitely imagine that many error handling code paths (e.g. checking 
>> for allocation failures) would not be exercised without specific changes 
>> (see e.g. my unlanded patch to fix the OBD_ALLOC() failure injection code). 
>> 
>> Running a test with periodic random allication failures enabled and fixing 
>> the resulting bugs would improve coverage, though not in a systematic way 
>> that could be measured/repeated. Still, this would find a class if 
>> hard-to-find bugs.
>> 
>> Similarly, running racer for extended periods is a good form of coverage 
>> generation, even if not systematic/repeatable. I think the racer code could 
>> be improved/extended by adding racet scripts that are Lustre-specific or 
>> exercise new functionality (e.g. "lfs setstripe", setfattr, getfattr, 
>> setfacl, getfacl). Running multiple racer instances on multiple 
>> clients/mounts and throwing recovery into the mix would definitely find new 
>> bugs.
>> 
>> In general, having the code coverage is a good starting point, but it isn't 
>> necessarily useful if nothing is done to improve the coverage of the tests 
>> as a result. 
>> 
>> Cheers, Andreas
>> 
>> On 2012-09-20, at 7:21, Roman Grigoryev <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> next coverage measurement published,
>>> please see
>>> http://www.opensfs.org/foswiki/bin/view/Lustre/CodeCoverage20120915
>>> 
>>> Entrance page http://www.opensfs.org/foswiki/bin/view/Lustre/CodeCoverage
>>> 
>>> 
>>> Thanks,
>>>   Roman
>>> _______________________________________________
>>> discuss mailing list
>>> [email protected]
>>> http://lists.opensfs.org/listinfo.cgi/discuss-opensfs.org
>> _______________________________________________
>> discuss mailing list
>> [email protected]
>> http://lists.opensfs.org/listinfo.cgi/discuss-opensfs.org
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/




_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] [Discuss] coverage measurement at 2012 09 15

Reply via email to