[jira] [Updated] (DISPATCH-1962) Make LSan suppressions more targeted and specific (within reason)

Jira Tue, 16 Mar 2021 01:08:06 -0700


     [ 
https://issues.apache.org/jira/browse/DISPATCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jiri Daněk updated DISPATCH-1962:
---------------------------------
    Description: 
There are some unfortunate suppressions in {{tests/lsan.supp}} at this moment:

{code}
leak:*libwebsockets*
{code}

This is way too broad. It suppresses leaks in Dispatch files, since it matches 
e.g. {{src/http-libwebsockets.c}}.

The stars at the beginning and end are actually assumed implicitly. If you do 
not want substring match, you have to do ^foo$. LSan suppression format is 
simplistic, very unlike Valgrind's.

{code}
leak:run_unit_tests.c
{code}

Same thing, any leaks revealed by running unit_tests get suppressed. This 
suppression suppresses all leak traces that include run_unit_tests.c anywhere 
in the stack. What't the point of running such tests under leak detector, then?

{code}
leak:run_unit_tests.c
leak:^libqpid-proton.so$
{code}

Same thing. The patterns suppress all leaks that include Python (or Proton) 
anywhere in the stacktrace. That means there are huge blind spots where 
dispatch leaks can hide. This is a weakness of the lsan.supp syntax (Valgrind 
suppressions can be much more targeted and discerning).

h3. Python leaks

Leaks are known and there is ongoing effort to fight them: 
https://bugs.python.org/issue1635741 (https://bugs.python.org/issue25302) and 
https://www.python.org/dev/peps/pep-3121

Here's valgrind suppression file from somebody who actually investigated the 
Python leaks and identified the harmless ones: 
https://github.com/libgit2/pygit2/blob/master/misc/valgrind-python.supp

One example of a hidden leak in dispatch, which is revealed by making Python 
suppressions more targetted:

{code}
9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
9:     #0 0x7f78a3606e8f in __interceptor_malloc 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
9:     #2 0x7f78a2d657da in qdr_core_subscribe 
../src/router_core/route_tables.c:149
9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
9:     #4 0x7f78a2353a6c in type_call 
(/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)
{code}

The problem is in

{code}
class Agent:
    ...
    def activate(self, address):
        ...
        self.io = IoAdapter(self.receive, address, 'L', '0', 
TREATMENT_ANYCAST_CLOSEST)
{code}

IoAdapter refers to Agent (through the bound method reference self.receive) and 
Agent refers to IoAdapter (through property self.io). Since IoAdapter is 
implemented in C and does not implement support for Python's cyclic GC, there 
is no way to break the cycle.

Heap dump in attachment. The bound method is at the top of the picture. (Ignore 
the Mock objects, I was trying to simplify the picture while not getting 
crashes due to too much meddling).

h3. Random observations

It is possible to build special Debug build of Python, which has tools to 
detect leaks, asserts to prevent negative refcounts, etc. 
https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#debug-version-of-python-memory-alloc-label

Use the following to detect python leaks (instead of valgrind)
https://docs.python.org/3/library/tracemalloc.html

Use https://pypi.org/project/objgraph (with graphviz) to view heap object 
trees. The following renders the picture as a png under /tmp and prints the 
path to stdout.

{code}
int ret = PyRun_SimpleString("import objgraph; 
objgraph.show_backrefs(config.global_agent, max_depth=10)\n\n");
PyErr_PrintEx(0);
assert(ret == 0);
{code}

h2. Tools

h3. CPyChecker
https://dmalcolm.fedorapeople.org/presentations/PyCon-US-2013/PyCon-US-2013-dmalcolm-StaticAnalysis.html#(38)
https://emptysqua.re/blog/analyzing-python-c-extensions-with-cpychecker/
https://nedbatchelder.com/blog/201502/cpychecker.html

h3. Pungi
http://www.cse.psu.edu/~gxt29/papers/refcount.pdf

h3. Use CFFI or similar, instead of writing a C module.

  was:
There are some unfortunate suppressions in {{tests/lsan.supp}} at this moment:

{code}
leak:*libwebsockets*
{code}

This is way too broad. It suppresses leaks in Dispatch files, since it matches 
e.g. {{src/http-libwebsockets.c}}.

The stars at the beginning and end are actually assumed implicitly. If you do 
not want substring match, you have to do ^foo$. LSan suppression format is 
simplistic, very unlike Valgrind's.

{code}
leak:run_unit_tests.c
{code}

Same thing, any leaks revealed by running unit_tests get suppressed. This 
suppression suppresses all leak traces that include run_unit_tests.c anywhere 
in the stack. What't the point of running such tests under leak detector, then?

{code}
leak:run_unit_tests.c
leak:^libqpid-proton.so$
{code}

Same thing. The patterns suppress all leaks that include Python (or Proton) 
anywhere in the stacktrace. That means there are huge blind spots where 
dispatch leaks can hide. This is a weakness of the lsan.supp syntax (Valgrind 
suppressions can be much more targeted and discerning).

h3. Python leaks

Leaks are known and there is ongoing effort to fight them: 
https://bugs.python.org/issue1635741 (https://bugs.python.org/issue25302) and 
https://www.python.org/dev/peps/pep-3121

Here's valgrind suppression file from somebody who actually investigated the 
Python leaks and identified the harmless ones: 
https://github.com/libgit2/pygit2/blob/master/misc/valgrind-python.supp

One example of a hidden leak in dispatch, which is revealed by making Python 
suppressions more targetted:

{code}
9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
9:     #0 0x7f78a3606e8f in __interceptor_malloc 
(/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
9:     #2 0x7f78a2d657da in qdr_core_subscribe 
../src/router_core/route_tables.c:149
9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
9:     #4 0x7f78a2353a6c in type_call 
(/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)
{code}

The problem is in

{code}
class Agent:
    ...
    def activate(self, address):
        ...
        self.io = IoAdapter(self.receive, address, 'L', '0', 
TREATMENT_ANYCAST_CLOSEST)
{code}

IoAdapter refers to Agent (through the bound method reference self.receive) and 
Agent refers to IoAdapter (through property self.io). Since IoAdapter is 
implemented in C and does not implement support for Python's cyclic GC, there 
is no way to break the cycle.

Heap dump in attachment. The bound method is at the top of the picture. (Ignore 
the Mock objects, I was trying to simplify the picture while not getting 
crashes due to too much meddling).

h3. Random observations

It is possible to build special Debug build of Python, which has tools to 
detect leaks, asserts to prevent negative refcounts, etc. 
https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#debug-version-of-python-memory-alloc-label

Use the following to detect python leaks (instead of valgrind)
https://docs.python.org/3/library/tracemalloc.html

Use https://pypi.org/project/objgraph (with graphviz) to view heap object 
trees. The following renders the picture as a png under /tmp and prints the 
path to stdout.

{code}
int ret = PyRun_SimpleString("import objgraph; 
objgraph.show_backrefs(config.global_agent, max_depth=10)\n\n");
PyErr_PrintEx(0);
assert(ret == 0);
{code}


> Make LSan suppressions more targeted and specific (within reason)
> -----------------------------------------------------------------
>
>                 Key: DISPATCH-1962
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1962
>             Project: Qpid Dispatch
>          Issue Type: Test
>    Affects Versions: 1.15.0
>            Reporter: Jiri Daněk
>            Priority: Major
>         Attachments: dispatch_leaking_agent.png
>
>
> There are some unfortunate suppressions in {{tests/lsan.supp}} at this moment:
> {code}
> leak:*libwebsockets*
> {code}
> This is way too broad. It suppresses leaks in Dispatch files, since it 
> matches e.g. {{src/http-libwebsockets.c}}.
> The stars at the beginning and end are actually assumed implicitly. If you do 
> not want substring match, you have to do ^foo$. LSan suppression format is 
> simplistic, very unlike Valgrind's.
> {code}
> leak:run_unit_tests.c
> {code}
> Same thing, any leaks revealed by running unit_tests get suppressed. This 
> suppression suppresses all leak traces that include run_unit_tests.c anywhere 
> in the stack. What't the point of running such tests under leak detector, 
> then?
> {code}
> leak:run_unit_tests.c
> leak:^libqpid-proton.so$
> {code}
> Same thing. The patterns suppress all leaks that include Python (or Proton) 
> anywhere in the stacktrace. That means there are huge blind spots where 
> dispatch leaks can hide. This is a weakness of the lsan.supp syntax (Valgrind 
> suppressions can be much more targeted and discerning).
> h3. Python leaks
> Leaks are known and there is ongoing effort to fight them: 
> https://bugs.python.org/issue1635741 (https://bugs.python.org/issue25302) and 
> https://www.python.org/dev/peps/pep-3121
> Here's valgrind suppression file from somebody who actually investigated the 
> Python leaks and identified the harmless ones: 
> https://github.com/libgit2/pygit2/blob/master/misc/valgrind-python.supp
> One example of a hidden leak in dispatch, which is revealed by making Python 
> suppressions more targetted:
> {code}
> 9: Direct leak of 56 byte(s) in 1 object(s) allocated from:
> 9:     #0 0x7f78a3606e8f in __interceptor_malloc 
> (/nix/store/g40sl3zh3nv52vj0mrl4iki5iphh5ika-gcc-10.2.0-lib/lib/libasan.so.6+0xace8f)
> 9:     #1 0x7f78a2d64afb in qd_malloc ../include/qpid/dispatch/ctools.h:229
> 9:     #2 0x7f78a2d657da in qdr_core_subscribe 
> ../src/router_core/route_tables.c:149
> 9:     #3 0x7f78a2c83072 in IoAdapter_init ../src/python_embedded.c:711
> 9:     #4 0x7f78a2353a6c in type_call 
> (/nix/store/r85nxfnwiv45nbmf5yb60jj8ajim4m7w-python3-3.8.5/lib/libpython3.8.so.1.0+0x165a6c)
> {code}
> The problem is in
> {code}
> class Agent:
>     ...
>     def activate(self, address):
>         ...
>         self.io = IoAdapter(self.receive, address, 'L', '0', 
> TREATMENT_ANYCAST_CLOSEST)
> {code}
> IoAdapter refers to Agent (through the bound method reference self.receive) 
> and Agent refers to IoAdapter (through property self.io). Since IoAdapter is 
> implemented in C and does not implement support for Python's cyclic GC, there 
> is no way to break the cycle.
> Heap dump in attachment. The bound method is at the top of the picture. 
> (Ignore the Mock objects, I was trying to simplify the picture while not 
> getting crashes due to too much meddling).
> h3. Random observations
> It is possible to build special Debug build of Python, which has tools to 
> detect leaks, asserts to prevent negative refcounts, etc. 
> https://pythonextensionpatterns.readthedocs.io/en/latest/debugging/debug_python.html#debug-version-of-python-memory-alloc-label
> Use the following to detect python leaks (instead of valgrind)
> https://docs.python.org/3/library/tracemalloc.html
> Use https://pypi.org/project/objgraph (with graphviz) to view heap object 
> trees. The following renders the picture as a png under /tmp and prints the 
> path to stdout.
> {code}
> int ret = PyRun_SimpleString("import objgraph; 
> objgraph.show_backrefs(config.global_agent, max_depth=10)\n\n");
> PyErr_PrintEx(0);
> assert(ret == 0);
> {code}
> h2. Tools
> h3. CPyChecker
> https://dmalcolm.fedorapeople.org/presentations/PyCon-US-2013/PyCon-US-2013-dmalcolm-StaticAnalysis.html#(38)
> https://emptysqua.re/blog/analyzing-python-c-extensions-with-cpychecker/
> https://nedbatchelder.com/blog/201502/cpychecker.html
> h3. Pungi
> http://www.cse.psu.edu/~gxt29/papers/refcount.pdf
> h3. Use CFFI or similar, instead of writing a C module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (DISPATCH-1962) Make LSan suppressions more targeted and specific (within reason)

Reply via email to