On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote: > Hi Dave,
Hi Eric, thanks for the patch. > > Recently I've been working on symbolic value support for the reference > count checker. I've attached a patch for it below; let me know it looks > OK for trunk. Thanks! > > Best, > Eric > > --- > > This patch enhances the reference count checker in the CPython plugin by > adding support for symbolic values. Whereas previously we were only able > to check the reference count of PyObject* objects created in the scope > of the function; we are now able to emit diagnostics on reference count > mismatch of objects that were, for example, passed in as a function > parameter. > > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but > ob_refcnt field is N + ‘2’ > 6 | return obj; > | ^~~ [...snip...] > create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > index bf1982e79c3..d7ecd7fce09 100644 > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > @@ -314,17 +314,20 @@ public: > { > diagnostic_metadata m; > bool warned; > - // just assuming constants for now > - auto actual_refcnt > - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); > - auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant > (); > - warned = warning_meta (rich_loc, m, get_controlling_option (), > - "expected %qE to have " > - "reference count: %qE but ob_refcnt field is: %qE", > - m_reg_tree, actual_refcnt, ob_refcnt); > - > - // location_t loc = rich_loc->get_loc (); > - // foo (loc); > + > + const auto *actual_refcnt_constant > + = m_actual_refcnt->dyn_cast_constant_svalue (); > + const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue > (); > + if (!actual_refcnt_constant || !ob_refcnt_constant) > + return false; > + > + auto actual_refcnt = actual_refcnt_constant->get_constant (); > + auto ob_refcnt = ob_refcnt_constant->get_constant (); > + warned = warning_meta ( > + rich_loc, m, get_controlling_option (), > + "expected %qE to have " > + "reference count: N + %qE but ob_refcnt field is N + %qE", > + m_reg_tree, actual_refcnt, ob_refcnt); > return warned; I know you're emulating the old behavior I implemented way back in cpychecker, but I don't like that behavior :( Specifically, although the patch improves the behavior for symbolic values, it regresses the precision of wording for the concrete values case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have 1 pointer, then it's more readable to say: warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt field is ‘2’ than: warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’ ...and we shouldn't quote concrete numbers, the message should be: warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field is 2 or better: warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2 Can you move the unwrapping of the svalue from the tests below into the emit vfunc? That way the m_actual_refcnt doesn't have to be a constant_svalue; you could have logic in the emit vfunc to print readable messages based on what kind of svalue it is. Rather than 'N', it might be better to say 'initial'; how about: warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt' field has increased by 1 warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field has increased by 2 warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field is unchanged warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt' field has decreased by 1 warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field is unchanged and similar? Maybe have a flag that tracks whether we're talking about a concrete value that's absolute versus a concrete value that's relative to the initial value? [...snip...] > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, int> > &map, const region *key) > refcnt = existed ? refcnt + 1 : 1; > } > > +static const region * > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr) > +{ > + const auto *region_sval = sval->dyn_cast_region_svalue (); > + if (region_sval) > + return region_sval->get_pointee (); > + > + const auto *initial_sval = sval->dyn_cast_initial_svalue (); > + if (initial_sval) > + return mgr->get_symbolic_region (initial_sval); > + > + return nullptr; > +} This is dereferencing a pointer, right? Can the caller use region_model::deref_rvalue instead? [...snip...] > +static void > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval) > +{ > + if (ob_refcnt_sval->get_kind () != SK_CONSTANT) > + { > + auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast (); > + if (!unwrap_cast) > + unwrap_cast = ob_refcnt_sval; > + > + if (unwrap_cast->get_kind () == SK_BINOP) > + ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 (); This would be better spelled as: if (const binop_svalue *binop_sval = unwrap_cast->dyn_cast_binop_svalue ()) ob_refcnt_sval = binop_sval->get_arg1 (); [...snip...] > /* Compare ob_refcnt field vs the actual reference count of a region */ > static void > check_refcnt (const region_model *model, > const region_model *old_model, > region_model_context *ctxt, > const hash_map<const ana::region *, > - int>::iterator::reference_pair region_refcnt) > + int>::iterator::reference_pair ®ion_refcnt) Could really use a typedef for const hash_map<const ana::region *, int> to simplify this code. > { > region_model_manager *mgr = model->get_manager (); > const auto &curr_region = region_refcnt.first; > const auto &actual_refcnt = region_refcnt.second; > + > const svalue *ob_refcnt_sval > = retrieve_ob_refcnt_sval (curr_region, model, ctxt); > + if (!ob_refcnt_sval) > + return; > + > + unwrap_any_ob_refcnt_sval (ob_refcnt_sval); As noted above, can the diagnostic store the pre-unwrapped ob_refcnt_sval? Might mean you have to do the unwrapping both here, and later when displaying the diagnostic. Or (probably best) track both the original and unwrapped ob_refcnt_sval, and store both in the pending_diagnostic. > + > const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( > ob_refcnt_sval->get_type (), actual_refcnt); > - > if (ob_refcnt_sval != actual_refcnt_sval) > - { > - const svalue *curr_reg_sval > - = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region); > - tree reg_tree = old_model->get_representative_tree (curr_reg_sval); > - if (!reg_tree) > - return; > - > - const auto &eg = ctxt->get_eg (); > - refcnt_stmt_finder finder (*eg, reg_tree); > - auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval, > - actual_refcnt_sval, reg_tree); > - if (pd && eg) > - ctxt->warn (std::move (pd), &finder); > - } > + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval, > + actual_refcnt_sval, ctxt); > } > > static void > @@ -493,8 +520,6 @@ count_all_references (const region_model *model, > for (const auto &cluster : *model->get_store ()) > { > auto curr_region = cluster.first; > - if (curr_region->get_kind () != RK_HEAP_ALLOCATED) > - continue; > > increment_region_refcnt (region_to_refcnt, curr_region); > > @@ -505,8 +530,8 @@ count_all_references (const region_model *model, > > const svalue *unwrapped_sval > = binding_sval->unwrap_any_unmergeable (); > - // if (unwrapped_sval->get_type () != pyobj_ptr_tree) > - // continue; > + if (unwrapped_sval->get_type () != pyobj_ptr_tree) > + continue; We'll probably want a smarter test for this, that the type "inherits" C-style from PyObject (e.g. PyLongObject). > > const region *pointee = unwrapped_sval->maybe_get_region (); > if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED) > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > index e1efd9efda5..46daf2f8975 100644 > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > @@ -75,4 +75,23 @@ test_PyListAppend_6 () > PyObject *list = NULL; > PyList_Append(list, item); > return list; > -} > \ No newline at end of file > +} > + > +PyObject * > +test_PyListAppend_7 (PyObject *item) > +{ > + PyObject *list = PyList_New (0); > + Py_INCREF(item); > + PyList_Append(list, item); > + return list; > + /* { dg-warning "expected 'item' to have reference count" "" { target > *-*-* } .-1 } */ It would be good if these dg-warning directives regexp contained the actual and expected counts; I find I can't easily tell what the intended output is meant to be. > +} > + > +PyObject * > +test_PyListAppend_8 (PyObject *item, PyObject *list) > +{ > + Py_INCREF(item); > + Py_INCREF(item); > + PyList_Append(list, item); > + return list; > +} Should we complain here about item->ob_refcnt being too high? > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > new file mode 100644 > index 00000000000..a7f39509d6d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target analyzer } */ > +/* { dg-options "-fanalyzer" } */ > +/* { dg-require-python-h "" } */ > + > + > +#define PY_SSIZE_T_CLEAN > +#include <Python.h> > +#include "../analyzer/analyzer-decls.h" > + > +PyObject * > +test_refcnt_1 (PyObject *obj) > +{ > + Py_INCREF(obj); > + Py_INCREF(obj); > + return obj; > + /* { dg-warning "expected 'obj' to have reference count" "" { target *-*-* > } .-1 } */ Likewise, it would be better for the dg-warning directive's expressed the expected "actual vs expected" values. [...snip...] Thanks again for the patch, hope this is constructive Dave