[
https://issues.apache.org/jira/browse/MESOS-8255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler updated MESOS-8255:
-----------------------------------
Description:
The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking
interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned
future.
This interface is used by mesos internally for {{Group}}. As a result, this can
block libprocess worker threads. We put in a mitigation to have libprocess use
at least 8 worker threads to avoid this issue, but if one runs mesos with
modules that utilize additional {{Group}}s or other blocking code, then the
minimum number of worker threads that one would need increases.
The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker
threads, this would require returning futures and updating any client code that
depends on it.
In addition, libprocess can prevent deadlocks despite blocking code by spawning
additional threads when needed.
was:
The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking
interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned
future.
This interface is used by mesos internally for {{Group}}. As a result, this can
block libprocess worker threads. We put in a mitigation to have libprocess use
at least 8 worker threads to avoid this issue, but if one runs mesos with
modules that utilize additional {{Group}}s or other blocking code, then the
minimum number of worker threads that one would need increases.
The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker
threads, this would require returning futures and updating any client code that
depends on it. Possibly, we may want to remove it from the public C++ API or
consider exposing a Future or callback based version instead.
In addition, libprocess can prevent deadlocks despite blocking code by spawning
additional threads when needed.
> ZooKeeper API is blocking, can lead to deadlock of libprocess worker threads.
> -----------------------------------------------------------------------------
>
> Key: MESOS-8255
> URL: https://issues.apache.org/jira/browse/MESOS-8255
> Project: Mesos
> Issue Type: Bug
> Components: c++ api
> Reporter: Benjamin Mahler
>
> The {{ZooKeeper}} class exposed in the public C++ API for mesos is a blocking
> interface. It dispatches into {{ZooKeeperProcess}} and blocks on the returned
> future.
> This interface is used by mesos internally for {{Group}}. As a result, this
> can block libprocess worker threads. We put in a mitigation to have
> libprocess use at least 8 worker threads to avoid this issue, but if one runs
> mesos with modules that utilize additional {{Group}}s or other blocking code,
> then the minimum number of worker threads that one would need increases.
> The {{ZooKeeper}} class should be made asynchronous to avoid blocking worker
> threads, this would require returning futures and updating any client code
> that depends on it.
> In addition, libprocess can prevent deadlocks despite blocking code by
> spawning additional threads when needed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)