Hey Kilian,
> occasionally (possibly a race condition) our application
> crashes with a stack trace like this:
>
> {what}: suspending thread while at least one lock is being
> held, stack backtrace: 27 frames:
> 0x7fddfe5ba98f : ??? + 0x7fddfe5ba98f in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfe5baace : ??? + 0x7fddfe5baace in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfe5b4471 :
> hpx::detail::backtrace_direct[abi:cxx11](unsigned long) +
> 0x23 in /usr/local/lib/libhpxd.so.1
> 0x7fddff17f875 : hpx::util::verify_no_locks() + 0xb3 in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfef4881f :
> hpx::this_thread::suspend(hpx::threads::thread_state_enum,
> boost::intrusive_ptr<hpx::threads::thread_data> const&,
> hpx::util::thread_description const&, hpx::error_code&) +
> 0x87 in /usr/local/lib/libhpxd.so.1
> 0x91a1b3 :
> hpx::this_thread::suspend(hpx::threads::thread_state_enum,
> hpx::util::thread_description const&, hpx::error_code&) +
> 0x40 in [...]
> 0x91a301 :
> hpx::lcos::local::spinlock::yield(unsigned long) + 0x6e in
> [...]
> 0x91a409 : hpx::lcos::local::spinlock::lock() +
> 0x3f in [...]
> 0x9251f6 :
> std::lock_guard<hpx::lcos::local::spinlock>::lock_guard(hpx::lcos::local::
> spinlock&)
> + 0x2a in [...]
> 0x7fde0024f01f :
> CommunicationHandler::expectImage(hpx::naming::id_type) +
> 0x91 in [...]/build/libhpx_Block.so
>
> What I make out of this, is this:
> One of our threads holds a lock, then requests another but
> this second one is already locked, so it tries to suspend,
> which is not allowed while holding a lock?
>
> However the task CommunicationHandler::expectImage that
> yields this error should hold no locks prior to requesting
> the spinlock that causes the suspend.
The check wouldn't fire if that was the case (well, the check could be
buggy, but so far it was always correct).
> Here are my questions:
> Is my interpretation of the occurring events that lead to
> this error correct?
Yes. We added that check to make sure that no lock is being held by a HPX
thread while it is being suspended.
> For what purpose is it illegal to suspend while holding a
> lock?
It is not illegal and we're aware that the check is a bit rigorous. You can
have perfectly valid code which does that. We added this check as in our
experience holding a lock while suspending can cause deadlocks which are
very difficult to diagnose. This is especially true if remote operations are
involved.
You can either mark a particular lock as being safe by using the
hpx::util::ignore_while_checking class
(https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/register_locks.hp
p#L62) which disables/enables the check for a particular lock:
{
static hpx::lcos::local::mutex mtx;
std::lock_guard<hpx::lcos::local::mutex> l(mtx);
hpx::util::ingnore_while_checking iwc(&l);
hpx::threads::suspend(); // this will not report the held lock
}
You can also use the class hpx::util::ignore_all_while_checking
(https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/register_locks.hp
p#L43) to disable checking for all locks.
If you would like to disable the lock-checking altogether, simply configure
HPX with
cmake -DHPX_WITH_VERIFY_LOCKS=Off ...
which disables this 'feature' completely.
> Is there a way to query the number and or nature of locks
> currently held by the executing thread?
Not at this point, at least not in the API. Do you need this functionality?
> Different tasks spawned by (possibly remote) asynchronous
> calls may be run in the same hpx worker thread...
No, this can't happen. Every task is run on a new HPX thread. They might
'share' an underlying kernel-thread, but that is another story, I guess.
> is it
> possible for a threads different tasks to influence each
> other regarding something like held locks?
No. I don't think so. The internal data structure which is used to keep
track of held locks is thread-local wrt kernel-threads. Thus it is
impossible for one HPX thread (executed by a kernel-thread) to have a lock
held which would get reported during suspension of another HPX-thread.
HTH
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu
_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users