Re: [PATCH v9 1/2] Add condition coverage (MC/DC)

Jan Hubicka Thu, 22 Feb 2024 05:26:21 -0800

Hello,
> This patch adds support in gcc+gcov for modified condition/decision
> coverage (MC/DC) with the -fcondition-coverage flag. MC/DC is a type of
> test/code coverage and it is particularly important for safety-critical
> applicaitons in industries like aviation and automotive. Notably, MC/DC
> is required or recommended by:
> 
>     * DO-178C for the most critical software (Level A) in avionics.
>     * IEC 61508 for SIL 4.
>     * ISO 26262-6 for ASIL D.
> 
> From the SQLite webpage:
> 
>     Two methods of measuring test coverage were described above:
>     "statement" and "branch" coverage. There are many other test
>     coverage metrics besides these two. Another popular metric is
>     "Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
>     MC/DC as follows:
> 
>         * Each decision tries every possible outcome.
>         * Each condition in a decision takes on every possible outcome.
>         * Each entry and exit point is invoked.
>         * Each condition in a decision is shown to independently affect
>           the outcome of the decision.
> 
>     In the C programming language where && and || are "short-circuit"
>     operators, MC/DC and branch coverage are very nearly the same thing.
>     The primary difference is in boolean vector tests. One can test for
>     any of several bits in bit-vector and still obtain 100% branch test
>     coverage even though the second element of MC/DC - the requirement
>     that each condition in a decision take on every possible outcome -
>     might not be satisfied.
> 
>     https://sqlite.org/testing.html#mcdc
> 
> MC/DC comes in different flavours, the most important being unique
> cause MC/DC and masking MC/DC - this patch implements masking MC/DC,
> which is works well with short circuiting semantics, and according to
> John Chilenski's "An Investigation of Three Forms of the Modified
> Condition Decision Coverage (MCDC) Criterion" (2001) is as good as
> unique cause at catching bugs.
> 
> Whalen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
> MC/DC" describes an algorithm for determining masking from an AST walk,
> but my algorithm figures this out from analyzing the control flow graph.
> The CFG is considered a binary decision diagram and an evaluation
> becomes a path through the BDD, which is recorded. Certain paths will
> mask ("null out") the contribution from earlier path segments, which can
> be determined by finding short circuit endpoints. Masking is the
> short circuiting of terms in the reverse-ordered Boolean function, and
> the masked terms do not affect the decision like short-circuited
> terms do not affect the decision.
> 
> A tag/discriminator mapping from gcond->uid is created during
> gimplification and made available through the function struct. The
> values are unimportant as long as basic conditions constructed from a
> single Boolean expression are given the same identifier. This happens in
> the breaking down of ANDIF/ORIF trees, so the coverage generally works
> well for frontends that create such trees.
> 
> Like Whalen et al this implementation records coverage in fixed-size
> bitsets which gcov knows how to interpret. This takes only a few bitwise
> operations per condition and is very fast, but comes with a limit on the
> number of terms in a single boolean expression; the number of bits in a
> gcov_unsigned_type (which is usually typedef'd to uint64_t). For most
> practical purposes this is acceptable, and by default a warning will be
> issued if gcc cannot instrument the expression.  This is a practical
> limitation in the implementation, not the algorithm, so support for more
> conditions can be added by also introducing arbitrary-sized bitsets.
> 
> In action it looks pretty similar to the branch coverage. The -g short
> opt carries no significance, but was chosen because it was an available
> option with the upper-case free too.
> 
> gcov --conditions:
> 
>         3:   17:void fn (int a, int b, int c, int d) {
>         3:   18:    if ((a && (b || c)) && d)
> conditions covered 3/8
> condition  0 not covered (true false)
> condition  1 not covered (true)
> condition  2 not covered (true)
> condition  3 not covered (true)
>         1:   19:        x = 1;
>         -:   20:    else
>         2:   21:        x = 2;
>         3:   22:}
> 
> gcov --conditions --json-format:
> 
> "conditions": [
>     {
>         "not_covered_false": [
>             0
>         ],
>         "count": 8,
>         "covered": 3,
>         "not_covered_true": [
>             0,
>             1,
>             2,
>             3
>         ]
>     }
> ],
> 
> Expressions with constants may be heavily rewritten before it reaches
> the gimplification, so constructs like int x = a ? 0 : 1 becomes
> _x = (_a == 0). From source you would expect coverage, but it gets
> neither branch nor condition coverage. The same applies to expressions
> like int x = 1 || a which are simply replaced by a constant.
> 
> The test suite contains a lot of small programs and functions. Some of
> these were designed by hand to test for specific behaviours and graph
> shapes, and some are previously-failed test cases in other programs
> adapted into the test suite.
> 
> gcc/ChangeLog:
> 
>       * builtins.cc (expand_builtin_fork_or_exec): Check
>         condition_coverage_flag.
>       * collect2.cc (main): Add -fno-condition-coverage to OBSTACK.
>       * common.opt: Add new options -fcondition-coverage and
>         -Wcoverage-too-many-conditions.
>       * doc/gcov.texi: Add --conditions documentation.
>       * doc/invoke.texi: Add -fcondition-coverage documentation.
>       * function.cc (allocate_struct_function): Init cond_uids.
>       * function.h (struct function): Add cond_uids.
>       * gcc.cc: Link gcov on -fcondition-coverage.
>       * gcov-counter.def (GCOV_COUNTER_CONDS): New.
>       * gcov-dump.cc (tag_conditions): New.
>       * gcov-io.h (GCOV_TAG_CONDS): New.
>       (GCOV_TAG_CONDS_LENGTH): New.
>       (GCOV_TAG_CONDS_NUM): New.
>       * gcov.cc (class condition_info): New.
>       (condition_info::condition_info): New.
>       (condition_info::popcount): New.
>       (struct coverage_info): New.
>       (add_condition_counts): New.
>       (output_conditions): New.
>       (print_usage): Add -g, --conditions.
>       (process_args): Likewise.
>       (output_intermediate_json_line): Output conditions.
>       (read_graph_file): Read condition counters.
>       (read_count_file): Likewise.
>       (file_summary): Print conditions.
>       (accumulate_line_info): Accumulate conditions.
>       (output_line_details): Print conditions.
>       * gimplify.cc (next_cond_uid): New.
>       (reset_cond_uid): New.
>       (shortcut_cond_r): Set condition discriminator.
>       (tag_shortcut_cond): New.
>       (shortcut_cond_expr): Set condition discriminator.
>       (gimplify_cond_expr): Likewise.
>       (gimplify_function_tree): Call reset_cond_uid.
>       * ipa-inline.cc (can_early_inline_edge_p): Check
>         condition_coverage_flag.
>       * ipa-split.cc (pass_split_functions::gate): Likewise.
>       * passes.cc (finish_optimization_passes): Likewise.
>       * profile.cc (struct condcov): New declaration.
>       (cov_length): Likewise.
>       (cov_blocks): Likewise.
>       (cov_masks): Likewise.
>       (cov_maps): Likewise.
>       (cov_free): Likewise.
>       (instrument_decisions): New.
>       (read_thunk_profile): Control output to file.
>       (branch_prob): Call find_conditions, instrument_decisions.
>       (init_branch_prob): Add total_num_conds.
>       (end_branch_prob): Likewise.
>       * tree-core.h (struct tree_exp): Add condition_uid.
>       * tree-profile.cc (struct conds_ctx): New.
>       (CONDITIONS_MAX_TERMS): New.
>       (EDGE_CONDITION): New.
>       (topological_cmp): New.
>       (index_of): New.
>       (single_p): New.
>       (single_edge): New.
>       (contract_edge_up): New.
>       (struct outcomes): New.
>       (conditional_succs): New.
>       (condition_index): New.
>       (condition_uid): New.
>       (masking_vectors): New.
>       (emit_assign): New.
>       (emit_bitwise_op): New.
>       (make_top_index_visit): New.
>       (make_top_index): New.
>       (paths_between): New.
>       (struct condcov): New.
>       (cov_length): New.
>       (cov_blocks): New.
>       (cov_masks): New.
>       (cov_maps): New.
>       (cov_free): New.
>       (find_conditions): New.
>       (struct counters): New.
>       (find_counters): New.
>       (resolve_counter): New.
>       (resolve_counters): New.
>       (instrument_decisions): New.
>       (tree_profiling): Check condition_coverage_flag.
>       (pass_ipa_tree_profile::gate): Likewise.
>       * tree.h (SET_EXPR_UID): New.
>       (EXPR_COND_UID): New.
> 
> libgcc/ChangeLog:
> 
>       * libgcov-merge.c (__gcov_merge_ior): New.
> 
> gcc/testsuite/ChangeLog:
> 
>       * lib/gcov.exp: Add condition coverage test function.
>       * g++.dg/gcov/gcov-18.C: New test.
>       * gcc.misc-tests/gcov-19.c: New test.
>       * gcc.misc-tests/gcov-20.c: New test.
>       * gcc.misc-tests/gcov-21.c: New test.
>       * gcc.misc-tests/gcov-22.c: New test.
>       * gcc.misc-tests/gcov-23.c: New test.


> diff --git a/gcc/function.cc b/gcc/function.cc
> index 89841787ff8..e57d0488c7a 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -4793,6 +4793,7 @@ allocate_struct_function (tree fndecl, bool abstract_p)
>  
>    cfun = ggc_cleared_alloc<function> ();
>  
> +  cfun->cond_uids = hash_map <gcond*, unsigned>::create_ggc ();

There are a lot of struct functions created during WPA ICF stage, where
this hash will be completely unused.  I think you can initialize it to
NULL here, allocate it at the gimplification time only and also
deallocate in free_after_compilation.

It would be good to arrange the patch to do nothing unless the condition
coverage command line option is used.  At the moment it seems you
compute conditional uids just to ignore them later.

Also at the moment the hash will dangle pointers to statements that was
optimized out. There are two options. Either initialize it with
GTY((cache)) which will make garbage collector to remove entries to
statements not reachable otherwise.

Other option would be move it out of gabrage collector completely and 
extend gsi_remove to also remove entry in the hash just as it does so
This should also make it possible to move it out of ggc.
for EH (see calls to remove_stmt_from_eh_lp).
> @@ -134,6 +136,33 @@ public:
>    vector<unsigned> lines;
>  };
>  
> +/* Describes a single conditional expression and the (recorded) conditions
> +   shown to independently affect the outcome.  */
> +class condition_info
> +{
> +public:
> +  condition_info ();
> +
> +  int popcount () const;
> +
> +  /* Bitsets storing the independently significant outcomes for true and 
> false,
> +   * respectively.  */
Extra * here.
> +  gcov_type_unsigned truev;
> +  gcov_type_unsigned falsev;
> +
> +  /* Number of terms in the expression; if (x) -> 1, if (x && y) -> 2 etc.  
> */
> +  unsigned n_terms;
> +};

> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 342e43a7f25..591cb50193c 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -71,6 +71,24 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "tree-nested.h"
>  
By coding style there should be acomment what nextuid is used for.
> +static unsigned nextuid = 1;
> +/* Get a fresh identifier for a new condition expression.  This is used for
> +   condition coverage.  */
> +static unsigned
> +next_cond_uid ()
> +{
> +    return nextuid++;
> +}
> +/* Reset the condition uid to the value it should have when compiling a new
> +   function.  0 is already the default/untouched value, so start at non-zero.
> +   A valid and set id should always be > 0.  This is used for condition
> +   coverage.  */
> +static void
> +reset_cond_uid ()
> +{
> +    nextuid = 1;
> +}
> +
>  /* Hash set of poisoned variables in a bind expr.  */

> +/* Given a multi-term condition (ANDIF, ORIF), walk the predicate and tag 
> every
> +   term with uid.  When two basic conditions share the uid discriminator when
> +   they belong to the same predicate, which used by the condition coverage.
> +   Doing this as an explicit step makes for a simpler implementation than
> +   weaving it into the splitting code as the splitting code eventually 
> reaches
> +   back up to gimplfiy_expr which makes bookkeeping complicated.  */

Coding style also asks to document parameters (and reffer to them in
uppercase, like PRED/CONDITION_UID)
> +static void
> +tag_shortcut_cond (tree pred, unsigned condition_uid)
> +{
> +  if (TREE_CODE (pred) == TRUTH_ANDIF_EXPR
> +      || TREE_CODE (pred) == TRUTH_ORIF_EXPR)
> +    {
> +      tree fst = TREE_OPERAND (pred, 0);
> +      tree lst = TREE_OPERAND (pred, 1);
> +
> +      if (TREE_CODE (fst) == TRUTH_ANDIF_EXPR
> +       || TREE_CODE (fst) == TRUTH_ORIF_EXPR)
> +     tag_shortcut_cond (fst, condition_uid);
> +      else if (TREE_CODE (fst) == COND_EXPR)
> +     SET_EXPR_UID (fst, condition_uid);
> +
> +      if (TREE_CODE (lst) == TRUTH_ANDIF_EXPR
> +       || TREE_CODE (lst) == TRUTH_ORIF_EXPR)
> +     tag_shortcut_cond (lst, condition_uid);
> +      else if (TREE_CODE (lst) == COND_EXPR)
> +     SET_EXPR_UID (lst, condition_uid);
> +    }
> +}
>  /* Given a conditional expression EXPR with short-circuit boolean
>     predicates using TRUTH_ANDIF_EXPR or TRUTH_ORIF_EXPR, break the
>     predicate apart into the equivalent sequence of conditionals.  */
>  
>  static tree
> -shortcut_cond_expr (tree expr)
> +shortcut_cond_expr (tree expr, unsigned condition_uid)
Similarly here CONDITION_UID should be documented.

> +
> +/* Compare two basic blocks by their order in the expression i.e. for (a || 
> b)
> +   then topological_cmp (a, b, ...) < 0.  The result is undefined if lhs, rhs
> +   belong to different expressions.  The top_index argument should be the
> +   top_index vector from ctx.  */
Please use LHS/RHS and TOP_INDEX in commants (in all functions you aded).
It also seems that you are not releasing the hash after its use.  It
leaks points to gimple statments that are removed, so it may 

The patch look OK to me with the hash memory management change above.
I really do apologize for being so slow on the reviews - it is an
interesting feature and I should have handled it faster.

Honza

Re: [PATCH v9 1/2] Add condition coverage (MC/DC)

Reply via email to