Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

David Malcolm via Gcc-patches Thu, 07 Sep 2023 10:30:00 -0700

On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:

> Hi Dave,


Hi Eric, thanks for the patch.

> 
> Recently I've been working on symbolic value support for the reference
> count checker. I've attached a patch for it below; let me know it looks
> OK for trunk. Thanks!
> 
> Best,
> Eric
> 
> ---
> 
> This patch enhances the reference count checker in the CPython plugin by
> adding support for symbolic values. Whereas previously we were only able
> to check the reference count of PyObject* objects created in the scope
> of the function; we are now able to emit diagnostics on reference count
> mismatch of objects that were, for example, passed in as a function
> parameter.
> 
> rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but 
> ob_refcnt field is N + ‘2’
>     6 |   return obj;
>       |          ^~~

[...snip...]

>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> 
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index bf1982e79c3..d7ecd7fce09 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -314,17 +314,20 @@ public:
>    {
>      diagnostic_metadata m;
>      bool warned;
> -    // just assuming constants for now
> -    auto actual_refcnt
> -     = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> -    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant 
> ();
> -    warned = warning_meta (rich_loc, m, get_controlling_option (),
> -                        "expected %qE to have "
> -                        "reference count: %qE but ob_refcnt field is: %qE",
> -                        m_reg_tree, actual_refcnt, ob_refcnt);
> -
> -    // location_t loc = rich_loc->get_loc ();
> -    // foo (loc);
> +
> +    const auto *actual_refcnt_constant
> +     = m_actual_refcnt->dyn_cast_constant_svalue ();
> +    const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue 
> ();
> +    if (!actual_refcnt_constant || !ob_refcnt_constant)
> +      return false;
> +
> +    auto actual_refcnt = actual_refcnt_constant->get_constant ();
> +    auto ob_refcnt = ob_refcnt_constant->get_constant ();
> +    warned = warning_meta (
> +     rich_loc, m, get_controlling_option (),
> +     "expected %qE to have "
> +     "reference count: N + %qE but ob_refcnt field is N + %qE",
> +     m_reg_tree, actual_refcnt, ob_refcnt);
>      return warned;

I know you're emulating the old behavior I implemented way back in
cpychecker, but I don't like that behavior :(

Specifically, although the patch improves the behavior for symbolic
values, it regresses the precision of wording for the concrete values
case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only have
1 pointer, then it's more readable to say:

  warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
field is ‘2’

than:

  warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field 
is N + ‘2’

...and we shouldn't quote concrete numbers, the message should be:

  warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field is 2

or better:

  warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2


Can you move the unwrapping of the svalue from the tests below into the
emit vfunc?  That way the m_actual_refcnt doesn't have to be a
constant_svalue; you could have logic in the emit vfunc to print
readable messages based on what kind of svalue it is.

Rather than 'N', it might be better to say 'initial'; how about:

  warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt' field 
has increased by 1
  warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field 
has increased by 2
  warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field 
is unchanged
  warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt' field 
has decreased by 1
  warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field is 
unchanged

and similar?

Maybe have a flag that tracks whether we're talking about a concrete
value that's absolute versus a concrete value that's relative to the
initial value?


[...snip...]


> @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, int> 
> &map, const region *key)
>    refcnt = existed ? refcnt + 1 : 1;
>  }
>  
> +static const region *
> +get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
> +{
> +  const auto *region_sval = sval->dyn_cast_region_svalue ();
> +  if (region_sval)
> +    return region_sval->get_pointee ();
> +
> +  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
> +  if (initial_sval)
> +    return mgr->get_symbolic_region (initial_sval);
> +
> +  return nullptr;
> +}

This is dereferencing a pointer, right?

Can the caller use region_model::deref_rvalue instead?


[...snip...]

> +static void
> +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
> +{
> +  if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
> +    {
> +      auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
> +      if (!unwrap_cast)
> +     unwrap_cast = ob_refcnt_sval;
> +
> +      if (unwrap_cast->get_kind () == SK_BINOP)
> +     ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 ();

This would be better spelled as:

         if (const binop_svalue *binop_sval = 
unwrap_cast->dyn_cast_binop_svalue ())
            ob_refcnt_sval = binop_sval->get_arg1 ();

[...snip...]

>  /* Compare ob_refcnt field vs the actual reference count of a region */
>  static void
>  check_refcnt (const region_model *model,
>             const region_model *old_model,
>             region_model_context *ctxt,
>             const hash_map<const ana::region *,
> -                          int>::iterator::reference_pair region_refcnt)
> +                          int>::iterator::reference_pair &region_refcnt)

Could really use a typedef for
  const hash_map<const ana::region *, int>
to simplify this code.

>  {
>    region_model_manager *mgr = model->get_manager ();
>    const auto &curr_region = region_refcnt.first;
>    const auto &actual_refcnt = region_refcnt.second;
> +
>    const svalue *ob_refcnt_sval
>        = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  if (!ob_refcnt_sval)
> +    return;
> +
> +  unwrap_any_ob_refcnt_sval (ob_refcnt_sval);

As noted above, can the diagnostic store the pre-unwrapped
ob_refcnt_sval?  Might mean you have to do the unwrapping both here,
and later when displaying the diagnostic.  Or (probably best) track
both the original and unwrapped ob_refcnt_sval, and store both in the
pending_diagnostic.

> +
>    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
>        ob_refcnt_sval->get_type (), actual_refcnt);
> -
>    if (ob_refcnt_sval != actual_refcnt_sval)
> -    {
> -      const svalue *curr_reg_sval
> -       = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
> -      tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
> -      if (!reg_tree)
> -     return;
> -
> -      const auto &eg = ctxt->get_eg ();
> -      refcnt_stmt_finder finder (*eg, reg_tree);
> -      auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> -                                           actual_refcnt_sval, reg_tree);
> -      if (pd && eg)
> -     ctxt->warn (std::move (pd), &finder);
> -    }
> +    handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
> +                         actual_refcnt_sval, ctxt);
>  }
>  
>  static void
> @@ -493,8 +520,6 @@ count_all_references (const region_model *model,
>    for (const auto &cluster : *model->get_store ())
>      {
>        auto curr_region = cluster.first;
> -      if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> -     continue;
>  
>        increment_region_refcnt (region_to_refcnt, curr_region);
>  
> @@ -505,8 +530,8 @@ count_all_references (const region_model *model,
>  
>         const svalue *unwrapped_sval
>             = binding_sval->unwrap_any_unmergeable ();
> -       // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> -       // continue;
> +       if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> +       continue;

We'll probably want a smarter test for this, that the type "inherits"
C-style from PyObject (e.g. PyLongObject).


>  
>         const region *pointee = unwrapped_sval->maybe_get_region ();
>         if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c 
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index e1efd9efda5..46daf2f8975 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -75,4 +75,23 @@ test_PyListAppend_6 ()
>    PyObject *list = NULL;
>    PyList_Append(list, item);
>    return list;
> -}
> \ No newline at end of file
> +}
> +
> +PyObject *
> +test_PyListAppend_7 (PyObject *item)
> +{
> +  PyObject *list = PyList_New (0);
> +  Py_INCREF(item);
> +  PyList_Append(list, item);
> +  return list;
> +  /* { dg-warning "expected 'item' to have reference count" "" { target 
> *-*-* } .-1 } */

It would be good if these dg-warning directives regexp contained the
actual and expected counts; I find I can't easily tell what the
intended output is meant to be.


> +}
> +
> +PyObject *
> +test_PyListAppend_8 (PyObject *item, PyObject *list)
> +{
> +  Py_INCREF(item);
> +  Py_INCREF(item);
> +  PyList_Append(list, item);
> +  return list;
> +}

Should we complain here about item->ob_refcnt being too high?

> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c 
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> new file mode 100644
> index 00000000000..a7f39509d6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_refcnt_1 (PyObject *obj)
> +{
> +  Py_INCREF(obj);
> +  Py_INCREF(obj);
> +  return obj;
> +  /* { dg-warning "expected 'obj' to have reference count" "" { target *-*-* 
> } .-1 } */

Likewise, it would be better for the dg-warning directive's expressed
the expected "actual vs expected" values.

[...snip...]

Thanks again for the patch, hope this is constructive

Dave

Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]

Reply via email to