[ 
https://issues.apache.org/jira/browse/MESOS-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853747#comment-15853747
 ] 

Alexander Rukletsov commented on MESOS-7065:
--------------------------------------------

See e.g. 
https://issues.apache.org/jira/browse/MESOS-7036?focusedCommentId=15851315&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15851315
 for initial discussions.

> Deadlock is possible when a pointer to an actor is captured in this actor's 
> callback.
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-7065
>                 URL: https://issues.apache.org/jira/browse/MESOS-7065
>             Project: Mesos
>          Issue Type: Improvement
>          Components: libprocess
>    Affects Versions: 1.0.2, 1.1.0, 1.2.0
>            Reporter: Alexander Rukletsov
>              Labels: libprocess, mesosphere
>
> A deadlock is triggered if a callback of the actor {{A a}} contains a 
> {{shared_ptr}} to itself. In this case, if the last copy of {{shared_ptr<A>}} 
> is held by the callback, {{clearAllCallbacks()}} called on {{a}}'s context 
> will try to terminate {{a}} and will never succeed.
> I've put up a [test demonstrating a 
> deadlock|https://gist.github.com/rukletsov/4ba5a70e74c50ae8229a19f7558055ff] 
> triggered by removing callbacks from a satisfied future: 
> {noformat}
> #include <iostream>
> #include <process/clock.hpp>
> #include <process/future.hpp>
> #include <process/id.hpp>
> #include <process/process.hpp>
> #include <process/shared.hpp>
> #include <stout/os.hpp>
> class SatisfierProcess : public process::Process<SatisfierProcess>
> {
> public:
>   SatisfierProcess() : ProcessBase(process::ID::generate("__satisfier__")) {}
>   virtual void finalize() { promise.discard(); }
>  process::Future<Nothing> future() { return promise.future(); }
>   void satisfy()
>   {
>     std::cout << " >> SatisfierProcess: satisfying the promise" << std::endl;
>     promise.set(Nothing());
>   }
>   process::Promise<Nothing> promise;
> };
> class Satisfier
> {
> public:
>   Satisfier()
>   {
>     process = new SatisfierProcess();
>     process::spawn(process);
>   }
>   ~Satisfier()
>   {
>     std::cout << " > ~Satisfier()" << std::endl;
>     process::terminate(process);
>     process::wait(process);
>     delete process;
>   }
>   process::Future<Nothing> future() const { return dispatch(process, 
> &SatisfierProcess::future); }
>   void satisfy() const { dispatch(process->self(), 
> &SatisfierProcess::satisfy); }
>   SatisfierProcess* process;
> };
> TEST(CircularDependencyTest, FutureCallbackRemovalTriggersDeadlock)
> {
>   {
>     // This shared pointer will go out of scope and will
>     // only be referenced from the `.onAny` callback.
>     process::Shared<Satisfier> s(new Satisfier);
>     s->future()
>       // Callback with a circular dependency. When it finishes,
>       // the only reference to `s` is in the callback itself.
>       .onAny(process::defer([s](const Future<Nothing>&) {
>         std::cout << " | First callback finished" << std::endl;
>         return;
>       }))
>       // Callback that takes some time to process. Note the absence
>       // of `defer` here: this callback is executed in the same context
>       // where the promise was set, i.e. `s->process->self()` and ensures
>       // the first callback has already finished when `clearAllCallbacks`
>       // is called. `clearAllCallbacks` removes the last reference to the
>       // `Satisfier` instance and hence calls its d-tor, i.e. `Satisfier`'s
>       // d-tor is called from the `Satisfier` context => deadlock.
>       .onAny([](const Future<Nothing>&) {
>         std::cout << " | Second callback started" << std::endl;
>         os::sleep(Seconds(2));
>         std::cout << " | Second callback finished" << std::endl;
>         return;
>       });
>     s->satisfy();
>   }
>   // Wait for all actors to process messages in their queues. Hint:
>   // this will not happen because one actor is waits on himself.
>   process::Clock::pause();
>   process::Clock::settle();
> }
> {noformat}
> The [output of the 
> test|https://gist.github.com/rukletsov/334f74e7a5ed3c2942f4ce95a3574cef] 
> looks like: 
> {noformat}
> [ RUN      ] CircularDependencyTest.FutureCallbackRemovalTriggersDeadlock
>  >> SatisfierProcess: satisfying the promise
>  | Second callback started
>  | First callback finished
>  | Second callback finished
>  > ~Satisfier()
> **** DEADLOCK DETECTED! ****
> You are waiting on process __satisfier__(1)@192.168.9.40:61447 that it is 
> currently executing.
> *** Aborted at 1485916555 (unix time) try "date -d @1485916555" if you are 
> using GNU date ***
> PC: @        0x10c2479a8 process::Future<>::Future()
> *** SIGSEGV (@0xdeb5) received by PID 39047 (TID 0x70000028d000) stack trace: 
> ***
>     @     0x7fff85acc52a _sigtramp
>     @     0x7fff6754a5c8 (unknown)
>     @        0x10c24795d process::Future<>::Future()
>     @        0x10c246990 process::Promise<>::future()
>     @        0x10f429a0d process::wait()
>     @        0x106f062cf process::wait()
>     @        0x1075a4b65 mesos::internal::tests::Satisfier::~Satisfier()
>     @        0x1075a1c15 mesos::internal::tests::Satisfier::~Satisfier()
>     @        0x1075a1b1d process::Shared<>::Data::~Data()
>     @        0x1075a19f5 process::Shared<>::Data::~Data()
>     @        0x1075a18f1 std::__1::__shared_ptr_pointer<>::__on_zero_shared()
>     @     0x7fff8515fcb8 std::__1::__shared_weak_count::__release_shared()
>     @        0x106f071bc std::__1::shared_ptr<>::~shared_ptr()
>     @        0x106f07185 std::__1::shared_ptr<>::~shared_ptr()
>     @        0x1075971c5 process::Shared<>::~Shared()
>     @        0x107590ce5 process::Shared<>::~Shared()
>     @        0x107586c25 
> mesos::internal::tests::CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test::TestBody()::$_0::~TestBody()
>     @        0x107586bb5 
> mesos::internal::tests::CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test::TestBody()::$_0::~TestBody()
>     @        0x10758b80c 
> _ZZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNSt3__18functionIFvT_EEEIRKNS_6FutureI7NothingEEEEvENUlSH_E_D2Ev
>     @        0x107587175 
> _ZZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNSt3__18functionIFvT_EEEIRKNS_6FutureI7NothingEEEEvENUlSH_E_D1Ev
>     @        0x107589c95 
> _ZNSt3__128__libcpp_compressed_pair_impIZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNS_8functionIFvT_EEEIRKNS1_6FutureI7NothingEEEEvEUlSI_E_NS_9allocatorISJ_EELj2EED2Ev
>     @        0x107589c75 
> _ZNSt3__117__compressed_pairIZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNS_8functionIFvT_EEEIRKNS1_6FutureI7NothingEEEEvEUlSI_E_NS_9allocatorISJ_EEED2Ev
>     @        0x107589c55 
> _ZNSt3__117__compressed_pairIZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNS_8functionIFvT_EEEIRKNS1_6FutureI7NothingEEEEvEUlSI_E_NS_9allocatorISJ_EEED1Ev
>     @        0x107589a44 
> _ZNSt3__110__function6__funcIZNK7process9_DeferredIZN5mesos8internal5tests65CircularDependencyTest_FutureCallbackRemovalTriggersDeadlock_Test8TestBodyEvE3$_0EcvNS_8functionIFvT_EEEIRKNS2_6FutureI7NothingEEEEvEUlSJ_E_NS_9allocatorISK_EEFvSJ_EE18destroy_deallocateEv
>     @        0x106f075fa std::__1::function<>::~function()
>     @        0x106f06395 std::__1::function<>::~function()
>     @        0x106f8b16a process::Future<>::Data::clearAllCallbacks()
>     @        0x106f8ae80 process::Future<>::_set<>()
>     @        0x106f8818d process::Future<>::set()
>     @        0x106f8f41a 
> _ZZNK7process6FutureI7NothingE7onReadyINSt3__16__bindIRMS2_FbRKS1_EJRS2_RNS4_12placeholders4__phILi1EEEEEEbEERKS2_OT_NS2_6PreferEENUlS7_E_clES7_
>     @        0x106f8f1fd 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZNK7process6FutureI7NothingE7onReadyINS_6__bindIRMS6_FbRKS5_EJRS6_RNS_12placeholders4__phILi1EEEEEEbEERKS6_OT_NS6_6PreferEEUlSA_E_SA_EEEvDpOT_
>     @        0x106f8ee09 
> _ZNSt3__110__function6__funcIZNK7process6FutureI7NothingE7onReadyINS_6__bindIRMS5_FbRKS4_EJRS5_RNS_12placeholders4__phILi1EEEEEEbEERKS5_OT_NS5_6PreferEEUlS9_E_NS_9allocatorISO_EEFvS9_EEclES9_
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {noformat}
> Note that I've introduced a [segfault in 
> {{"process.cpp"}}|https://gist.github.com/rukletsov/929d86d0dc6e3ecbb50f10667c4dfef2]
>  to get the stack trace:
> {noformat}
> bool wait(const UPID& pid, const Duration& duration)
> {
>   process::initialize();
>   if (!pid) {
>     return false;
>   }
>   // This could result in a deadlock if some code decides to wait on a
>   // process that has invoked that code!
>   if (__process__ != nullptr && __process__->self() == pid) {
>     std::cerr << "\n**** DEADLOCK DETECTED! ****\nYou are waiting on process "
>               << pid << " that it is currently executing." << std::endl;
>     ((process::Promise<Nothing>*)0xDEAD)->future();
>   }
>   ...
> }
> {noformat}
> It would be great if we can help users prevent from getting into this 
> deadlock by either rejecting or circumventing this pattern altogether.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to