Hello Jonas,
> The problem arises when accessing 'old' PMTs. That is PMTs, that were
> handed over to python from the C++ domain in the past, i.e. through a
> message handling callback. It appears the PMTs are only valid throughout
> the duration of the function they were handed to.
Hm, yes, that sounds like the typical C++ object life time. (In fact, as
I'll explain below, the problem lies deeper than threading – it's about
object ownership, which is kind of borked for PMTs, here, and actually,
not only those.)
So, great that you attached a test case! By the way, segfaults in valid
code (i.e. with small exceptions in any python code) are usually bugs,
and you're more than invited to open a bug report under [1], but you'll
need a gnuradio.org redmine account to see the "New Issue" button.
Rather than just present an answer I'll explain what I'm doing here, so
that you (and others) might recreate. I think there will not be much new
info in here for you, Jonas, but rather than just doing what I did to
verify, I'd thought I share
Roughly:
1. Get a test case. You supplied one; I can easily verify that, yes,
this crashes! Great! (This is the rare occasion where one can say
"great, it crashes!"; cherish these moments...)
2. Understand the test case; you already supplied an explanation of
what it does, and that is greatly helpful here
3. Throw your debugger at the problem
4. ???
5. Profit!
So, basically, we're stuck with 3. There's this [2] wiki page that
explains what you can do with bog-normal GDB and python scripts. The
current state of affairs is that at least Fedora (and I suspect Arch,
too) ship GDB and python-devel (or their Arch/pacman equivalents) with a
script that automatically enables python symbol name resolution when
running a python process – which is great, because that allows us to see
in which python functions things go wrong!
Then it all comes down to running (after installing the debug infos for
a lot of libraries – luckily, my GDB even prints out the actual package
manager commands I need to run to install the missing debug symbols)
gdb --args python /tmp/min_err_repro.py
then, on the GDB shell, "run", wait for the crash, and then "bt" (short
for "backtrace"). This led to this output for me:
#0 0x00007fffef62d2c5 in boost::detail::atomic_count::atomic_exchange_and_add
(dv=1, pw=0x39) at
/usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:67
#1 boost::detail::atomic_count::operator++ (this=0x39) at
/usr/include/boost/smart_ptr/detail/atomic_count_gcc_x86.hpp:30
#2 pmt::intrusive_ptr_add_ref (p=p@entry=0x31) at
/home/marcus/src/gnuradio/gnuradio-runtime/lib/pmt/pmt.cc:66
#3 0x00007fffe7e184c5 in boost::intrusive_ptr<pmt::pmt_base>::intrusive_ptr
(rhs=..., this=<optimized out>) at
/usr/include/boost/smart_ptr/intrusive_ptr.hpp:92
#4 boost::intrusive_ptr<pmt::pmt_base>::operator= (rhs=..., this=<synthetic
pointer>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:129
#5 _wrap_write_string (args=<optimized out>, kwargs=<optimized out>) at
/home/marcus/src/gnuradio/build/gnuradio-runtime/swig/pmt_swigPYTHON_wrap.cxx:39897
#6 0x00007ffff7af2796 in call_function (oparg=<optimized out>,
pp_stack=0x7fffde621220) at /usr/src/debug/Python-2.7.11/Python/ceval.c:4427
#7 PyEval_EvalFrameEx (
f=f@entry=Frame 0x7fffdf682730, for file
/home/marcus/.usrlocal/lib64/python2.7/site-packages/pmt/pmt_swig.py, line
3295, in write_string (obj=<swig_int_ptr(this=<SwigPyObject at remote
0x7fffdfcfaed0>) at remote 0x7fffdf659a10>),
throwflag=throwflag@entry=0) at
/usr/src/debug/Python-2.7.11/Python/ceval.c:3061
#8 0x00007ffff7af23e2 in fast_function (nk=<optimized out>, na=<optimized
out>, n=1, pp_stack=0x7fffde621360, func=<optimized out>) at
/usr/src/debug/Python-2.7.11/Python/ceval.c:4513
#9 call_function (oparg=<optimized out>, pp_stack=0x7fffde621360) at
/usr/src/debug/Python-2.7.11/Python/ceval.c:4448
So, yes, your suspicion was pretty right, this has something to do with
with the handling of objects in "pythonland".
PMTs are a bit special in a number of ways. I don't like all of these,
because they make those polymorphic types meant to be used for
portability less portable :)
So, first of all, pmt::pmt_t is actually a typedef for
boost::intrusive_pointer<pmt_base>, which is a refcounting pointer wrapper.
Now, if you hand over pmt_t from C++ to Python, Python needs your object
to be a CPython PyObject, which is the Python-internal "universal"
struct that's behind every single Python object. GNU Radio could have
written "glue code" for every single thing that we want to expose to
Python from C++, but instead, SWIG is used – which (kind of) fully
automatically generates wrapper code for C++/C functions, and adds
PyObjects with the appropriate properties and function delegates
(including type conversions etc) to all the classes that we need in Python.
So, this all is a bit of an onion situation:
Python(SWIG-generated PyObject(SWIG type abstraction(Intrusive Pointer
(pmt_base) ) ) )
Notice how we have a bit of a problem here:
Python has its own refcounting for the PyObject* that it handles. In
other words, as you do
key = self.get_tags_in_range(0, offs, offs+1)[0].key
Python increases the refcount of the PyObject that "self.get...[0].key"
is, and makes the "key" refer to that, but that does not increase the
refcount the intrusive_ptr has! In other words, after the GNU Radio
scheduler is done calling work (through C++/Python PyEval delegation),
it executes a "pruning" algorithm to identify the tags that do no longer
need to be held in the block's internal tag registry, and removes them
from the same, reducing their refcount – and if that count hits 0, then
the pmt_base the intrusive_ptr points to (and the intrusive_ptr itself)
gets deallocated.
Python's PyObject* doesn't notice any of that. It just happily calls
pmt:: functions on non-existing objects when you do
print self.tags
which can lead to a seg fault already at the second iteration.
Absolutely the same business is happening with your self.messages
contents – only that messages in a single sender/single receiver
scenario hit zero refcount more reliably.
Workaround: yeah.
Either extract the actual information you need from the PMTs the moment
you get them and store it in native python types, which is what I do
most of the time, or generate a copy by means of PMT functions to store
the same, or fix the PMT code (which would arguably be the only sane
thing to do, but my time currently doesn't allow for that).
Cheers,
Marcus
[1] http://gnuradio.org/redmine/projects/gnuradio/issues
[2] http://gnuradio.org/redmine/projects/gnuradio/wiki/TutorialsGDB
On 04.07.2016 21:33, Jonas Deitmerg wrote:
> Hello everyone,
>
> I've recently experienced some unexpected behavior when working with
> PMTs in messages and tags. Although I have already figured out how to
> avoid this issue, I'd like to know whether it's a systematic error or
> just a misunderstanding on my part.
>
> The problem arises when accessing 'old' PMTs. That is PMTs, that were
> handed over to python from the C++ domain in the past, i.e. through a
> message handling callback. It appears the PMTs are only valid throughout
> the duration of the function they were handed to.
>
> To illustrate the problem I have attached some python code which will
> reliably crash with a segmentation fault.
>
>
> Here's my current understanding of what's happening:
>
> 1. The block's thread sees a message that needs to be processed.
>
> 2. It dispatches the message (packed as pmt::pmt_t) to the callback
> function. Through Swig. I assume the reference counting of the pmt
> object is lost here.
>
> 3. The python function works on the data, e.g. saves it for later use.
>
> 4. Control returns to the C++ side, the pmt object goes out of scope and
> is freed.
>
> 5. Some other python code tries to access the pmt object and a segfault
> occurs.
>
>
> Is this roughly correct? If so, is there a way to solve this nicely?
> It's obviously possible to unpack the pmt object in step 3 and save the
> contained data for later use. But I'm probably not the last one to get
> bitten by this, and it's not exactly fun to debug.
>
> My setup consists of gnuradio 3.7.9.2, swig 3.0.10 and python 2.7.11
> running on Arch Linux, kernel 4.6.3, 64 bit.
>
> Thanks in advance
> Jonas
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio