[ 
https://issues.apache.org/jira/browse/MESOS-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8255:
-----------------------------------
    Description: 
The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking 
interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned 
future.

This interface is used by mesos internally for {{Group}}. As a result, this can 
block libprocess worker threads. We put in a mitigation to have libprocess use 
at least 8 worker threads to avoid this issue, but if one runs mesos with 
modules that utilize additional {{Group}}s or other blocking code, then the 
minimum number of worker threads that one would need increases.

The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker 
threads, this would require returning futures and updating any client code that 
depends on it.

In addition, libprocess can prevent deadlocks despite blocking code by spawning 
additional threads when needed.

  was:
The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking 
interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned 
future.

This interface is used by mesos internally for {{Group}}. As a result, this can 
block libprocess worker threads. We put in a mitigation to have libprocess use 
at least 8 worker threads to avoid this issue, but if one runs mesos with 
modules that utilize additional {{Group}}s or other blocking code, then the 
minimum number of worker threads that one would need increases.

The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker 
threads, this would require returning futures and updating any client code that 
depends on it. Possibly, we may want to remove it from the public C++ API or 
consider exposing a Future or callback based version instead.

In addition, libprocess can prevent deadlocks despite blocking code by spawning 
additional threads when needed.


> ZooKeeper API is blocking, can lead to deadlock of libprocess worker threads.
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-8255
>                 URL: https://issues.apache.org/jira/browse/MESOS-8255
>             Project: Mesos
>          Issue Type: Bug
>          Components: c++ api
>            Reporter: Benjamin Mahler
>
> The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking 
> interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned 
> future.
> This interface is used by mesos internally for {{Group}}. As a result, this 
> can block libprocess worker threads. We put in a mitigation to have 
> libprocess use at least 8 worker threads to avoid this issue, but if one runs 
> mesos with modules that utilize additional {{Group}}s or other blocking code, 
> then the minimum number of worker threads that one would need increases.
> The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker 
> threads, this would require returning futures and updating any client code 
> that depends on it.
> In addition, libprocess can prevent deadlocks despite blocking code by 
> spawning additional threads when needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to