Re: GCC selftest improvements

David Malcolm Thu, 24 Oct 2019 23:18:21 -0700

On Thu, 2019-10-24 at 20:50 +0000, Andrew Dean via gcc wrote:
> TLDR: I'd like to propose adding a dependency on a modern unit
> testing framework to make it easier to write unit tests within GCC.
> Before I spend much more time on it, what sort of buy-in should I
> get? Are there any people in particular I should work more closely
> with as I make this change?
>  
> Terminology: Within GCC, there are two types of tests in place: unit
> tests and regression tests. The unit tests have been written with a
> home-grown selftest framework and run as part of the build process.
> Any failures to a unit test results in no compiler being produced.
> The regression tests, on the other hand, run after build, and use the
> separate DejaGnu framework. In this email, I am only concerning
> myself with the unit tests, and throughout the remainder of the
> email, any mention of tests refers to these.
>  
> Working on GCC, I wanted to add some new unit tests to my feature as
> I went, but I noticed that there is a good deal of friction involved.
> Right now, adding new unit tests requires writing the test method,
> then modifying a second place in the code to call said test method,
> repeating as necessary until getting all the way to either the
> selftest.c file or the target hook. There is also no way to do test
> setup/teardown automatically. Everything is manual.
>  
> I'd like to propose adding a dependency on a modern open-source unit
> testing framework as an enhancement to the current self test system.
> I have used Catch2 (https://github.com/catchorg/Catch2, Boost
> Software License 1.0) with great success in the past. I experimented
> with adding it to GCC and converting a handful of tests to use
> Catch2. Although I only converted a small number of tests, I didn't
> see any performance impact during selftest. As a bonus, while doing
> so, I actually found that one test that I had written previously
> wasn't actually being run, because I had failed to manually call it.
>  
> Some nice things that Catch2 provides are better error reporting (see
> below for a comparison), ease of adding new tests (just include the
> header and write a TEST_CASE(), as opposed to the manual plumbing
> required right now), extension points for adding custom comparisons
> (I could see this being very useful to expand on the current rtl test
> macros), and the ability to run a subset of the tests without
> recompiling. It is also easy to integrate Catch2 with the existing
> self-test framework.
>  
> If this path seems useful to others, I'm happy to pursue it further.
> A list of work items I see are:
>  
> 1. Convert more tests to verify the claim that build performance is
> not degraded
> 2. Update the docs to list Catch2 as the new recommended way to write
> unit tests
> 3. If all of the target self-tests are converted, then we can remove
> the target test hook. Similar for the lang test hook.
>  
> One thing that would make Catch2 an even more slam-dunk case was if
> we were able to enable exceptions for the check builds. Then, running
> the unit tests could report multiple failures at the same time
> instead of just aborting at the first one. That said, even without
> enabling exceptions, Catch2 is on par with the current selftest in
> terms of terminating at the first failure.
>  
> Another option is to use a test framework that doesn't use
> exceptions, such as Google Test (https://github.com/google/googletest
> , BSD 3-Clause "New" or "Revised" License). I personally think Catch2
> is more flexible, or I would lead with Google Test. For example, in
> Catch2, shared setup is done in place with the tests itself, having
> each subtest be a nested SECTION, where-as in GTest, you have to
> write a test class that derives from ::test and overrides SetUp(). In
> addition, the sections in Catch2 can be nested further, allowing
> several related tests to build on each other. 
>  
> Here is some sample output for the case where all the tests are
> passing:
> =====================================================================
> ==========
> All tests passed (25 assertions in 5 test cases)
>  
> And here is the output when a test fails:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> is a Catch v2.9.2 host application.
> Run with -? for options
>  
> -------------------------------------------------------------------
> ------------
> test_set_range
> -------------------------------------------------------------------
> ------------
> ../../gcc/bitmap.c:2661
> .....................................................................
> ..........
> ../../gcc/bitmap.c:2668: FAILED:
>   REQUIRE( 6 == bitmap_count_bits (b) )
> with expansion:
>   6 == 5
>  
> Catch will terminate because it needed to throw an exception.
> The message was: Test failure requires aborting test!
> terminate called without an active exception
> ../../gcc/bitmap.c:2668: FAILED:
>   {Unknown expression after the reported line}
> due to a fatal error condition:
>   SIGABRT - Abort (abnormal termination) signal
> =====================================================================
> ==========
> test cases: 2 | 1 passed | 1 failed
> assertions: 5 | 3 passed | 2 failed
> cc1: internal compiler error: Aborted
> <long callstack>
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>  
> (Note that at the moment it doesn't know the name of our application
> or it would have prefixed "is a Catch..." with our app name).
>  
> Compare that to the output of the current test framework:
> ../../gcc/bitmap.c:2669: test_set_range: FAIL: ASSERT_EQ ((6),
> (bitmap_count_bits (b)))
> cc1: internal compiler error: in fail, at selftest.c:47
> /bin/bash ../../gcc/../move-if-change tmp-macro_list macro_list
> echo timestamp > s-macro_list
> <long callstack>
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.


Thanks for your email, it looks interesting.  Is your code somewhere we
can see it? 

I'm the author of gcc's selftest framework (and I use it heavily e.g.
for testing the diagnostics subsystem [1]).

It went through substantial changes during review.

I looked over my notes; for reference, here's a summary of the history
of the patches:

v1: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
  "[PATCH 00/17] RFC: Addding a unit testing framework to gcc"
     Used Google Test framework, was a dummy "frontend"

v2: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01224.html
  "[PATCH/RFC]: unittesting v2: as a plugin (was Re: [PATCH 00/17] RFC:
Addding a unit testing framework to gcc)"
    Still Google Test, as a plugin rather than a frontend

v3: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02947.html
  "[PATCH 00/16] Unit tests framework (v3)"
    Still Google Test, as a plugin built by plugin.exp within DejaGnu
tests, with a custom gtest reporter
    Some discussion about gtest:
      https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03215.html
    
v4: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02379.html
  "[PATCH 00/15] Unittests framework v4: -fself-test"
    Done via "-fself-test" via compiling a dummy file in DejaGnu tests
    I believe it was at this point that I switched to a custom API that
resembles gtest, rather than gtest itself.

v5: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00082.html
  "[PATCH 00/21] Add -fself-test framework for fast, early unit-testing 
(unittests v5)"
    Done via "-fself-test" at each of the 3 stages of bootstrap.

v6: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00210.html
  "[PATCH 00/16] v6 of -fself-test/unit-testing patch"
    Switched to "abort on first failure"
    Eliminated runner class, and from self-registrating tests to manual
test invocation

v7: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00298.html
   "[PATCH] Selftest framework (v7)"
   (one combined patch)

v8: approved; committed v8 to trunk as r237144 (for gcc 7):
   https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00410.html

Notable followups:

2016-07-11:
  * r238209: "Support running the selftests under valgrind"
    * https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00437.html

2017-12-11:
  * r255563: "Expensive selftests: torture testing for fix-it boundary
conditions (PR c/82050)"
    * https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02459.html
    (some tests run as a DejaGnu-built plugin)

2018-04-30:
  * r259782: "selftest: remove "Yoda ordering" in assertions"
    * https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01323.html

2018-10-17:
  * r265240: "Run selftests for C++ as well as C"
    * https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00802.html

I think the consensus during review was that I was over-engineering
things, and that each iteration from v1 to v8 made the code simpler and
involved less C++ "magic", closer to C.  Whether that's still the
consensus I don't know.  Different people within the GCC dev community
have different comfort levels with C++, and my initial version (using
gtest) was probably too "magic" for some.  Maybe people here are more
comfortable with C++ now?

GCC has some rather unique requirements, in that we support a great
many build configurations, some of which are rather primitive - for
example, requiring just C++98 with exceptions disabled, in that we want
to be able to be bootstrappable on relatively "ancient" configurations.
IIRC auto-registration of tests requires that the build configuration
have a sufficiently sane implementation of C++ - having globals with
non-trivial ctors tends to be problematic when dealing with early
implementations of C++.

Personally I don't find the manual registration of tests to be a pain,
but it would be nice to have more readable errors on failures.  There's
probably a case for more verbose test output.  (generally I immediately
just do "make selftest-gdb" on failures; the issue is if it suddenly
starts failing on a build I don't have access to)

I suspect that exceptions would be a deal-breaker; does Catch2 support
-fno-exceptions?

As for setup/teardown, I've been able to do that "manually" using RAII-
style classes in test functions.

Thanks again for your email; hope this is constructive.
Dave

[1] see e.g. the selftests in gcc/input.c and gcc/diagnostic-show-
locus.c

Re: GCC selftest improvements

Reply via email to