Hey Kilian,

> occasionally (possibly a race condition) our application
> crashes with a stack trace like this:
> {what}: suspending thread while at least one lock is being
> held, stack backtrace: 27 frames:
> 0x7fddfe5ba98f  : ??? + 0x7fddfe5ba98f in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfe5baace  : ??? + 0x7fddfe5baace in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfe5b4471  :
> hpx::detail::backtrace_direct[abi:cxx11](unsigned long) +
> 0x23 in /usr/local/lib/libhpxd.so.1
> 0x7fddff17f875  : hpx::util::verify_no_locks() + 0xb3 in
> /usr/local/lib/libhpxd.so.1
> 0x7fddfef4881f  :
> hpx::this_thread::suspend(hpx::threads::thread_state_enum,
> boost::intrusive_ptr<hpx::threads::thread_data> const&,
> hpx::util::thread_description const&, hpx::error_code&) +
> 0x87 in /usr/local/lib/libhpxd.so.1
> 0x91a1b3        :
> hpx::this_thread::suspend(hpx::threads::thread_state_enum,
> hpx::util::thread_description const&, hpx::error_code&) +
> 0x40 in [...]
> 0x91a301        :
> hpx::lcos::local::spinlock::yield(unsigned long) + 0x6e in
> [...]
> 0x91a409        : hpx::lcos::local::spinlock::lock() +
> 0x3f in [...]
> 0x9251f6        :
> std::lock_guard<hpx::lcos::local::spinlock>::lock_guard(hpx::lcos::local::
> spinlock&)
> + 0x2a in [...]
> 0x7fde0024f01f  :
> CommunicationHandler::expectImage(hpx::naming::id_type) +
> 0x91 in [...]/build/libhpx_Block.so
> What I make out of this, is this:
> One of our threads holds a lock, then requests another but
> this second one is already locked, so it tries to suspend,
> which is not allowed while holding a lock?
> However the task CommunicationHandler::expectImage that
> yields this error should hold no locks prior to requesting
> the spinlock that causes the suspend.

The check wouldn't fire if that was the case (well, the check could be
buggy, but so far it was always correct).

> Here are my questions:
> Is my interpretation of the occurring events that lead to
> this error correct?

Yes. We added that check to make sure that no lock is being held by a HPX
thread while it is being suspended.

> For what purpose is it illegal to suspend while holding a
> lock?

It is not illegal and we're aware that the check is a bit rigorous. You can
have perfectly valid code which does that. We added this check as in our
experience holding a lock while suspending can cause deadlocks which are
very difficult to diagnose. This is especially true if remote operations are

You can either mark a particular lock as being safe by using the
hpx::util::ignore_while_checking class
p#L62) which disables/enables the check for a particular lock:

        static hpx::lcos::local::mutex mtx;
        std::lock_guard<hpx::lcos::local::mutex> l(mtx);

        hpx::util::ingnore_while_checking iwc(&l);

        hpx::threads::suspend();    // this will not report the held lock

You can also use the class hpx::util::ignore_all_while_checking
p#L43) to disable checking for all locks.

If you would like to disable the lock-checking altogether, simply configure
HPX with 

    cmake -DHPX_WITH_VERIFY_LOCKS=Off ...

which disables this 'feature' completely.

> Is there a way to query the number and or nature of locks
> currently held by the executing thread?

Not at this point, at least not in the API. Do you need this functionality?

> Different tasks spawned by (possibly remote) asynchronous
> calls may be run in the same hpx worker thread... 

No, this can't happen. Every task is run on a new HPX thread. They might
'share' an underlying kernel-thread, but that is another story, I guess.

> is it
> possible for a threads different tasks to influence each
> other regarding something like held locks?

No. I don't think so. The internal data structure which is used to keep
track of held locks is thread-local wrt kernel-threads. Thus it is
impossible for one HPX thread (executed by a kernel-thread) to have a lock
held which would get reported during suspension of another HPX-thread.

Regards Hartmut

hpx-users mailing list

Reply via email to