I can't understand how deadlock happens, can you explain it in detail ?
On 2022/02/17 03:00:36 PengHui Li wrote: > Hi Lari, > > Thanks for raising a proposal and starting the discussion thread here. > Increasing the thread does not solve the underlying problem. > but also brings additional resource consumption. > > Thanks, Matteo provided a great explanation, > and, I totally agree with his explanation. > What should we try to optimize thread usage, > reduce block calls, rather than going in the opposite direction. > > For the current REST API implementation, > I saw 2 main problems > > 1. The API defined we are using the Jetty async mode, but > is actually a blocking call. I think 100% this is bad code, > confuse contributors, and bring potential risks > 2. We don't know the exact return type from the interface definition [1] > until reading the implementation code here[2] > > So, I think to change the current implementation to purely async > and avoid passing the `AsyncResponse` to the implementation > is the right way to improve the REST API and fix the potential problems. > > [1] > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java#L1073 > [2] > https://github.com/apache/pulsar/blob/df9a12dc90316e0d5c1d9effb416ec83fbcf9b5e/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java#L1259 > > Thanks, > Penghui > > On Thu, Feb 17, 2022 at 10:00 AM Matteo Merli <[email protected]> > wrote: > > > Hi Lari, > > > > thanks for the proposal, though I don't think it would be good to > > change this default value. > > > > Pulsar HTTP API is an administrative service and it is not on the > > critical data path. > > If all the threads in the pool are busy, it is good to apply back > > pressure and slow down the incoming new requests. > > > > In the past, we had several cases of deadlocks involving HTTP Jetty > > threads. In all these cases, the root cause has been a broker that in > > response to an HTTP request is spawning HTTP requests to other > > brokers, and some of these requests are going to itself. > > > > These kinds of deadlock are unrelated to the number of HTTP threads in > > the pool and must be dealt with by making the underlying operation > > asynchronous. > > > > Finally each user is free to change the default to something that > > suits more, though I really don't think it makes for us to default to > > 200 threads as: > > 1. It will not improve the Pulsar performance > > 2. It will not magically solve any deadlock issue > > 3. It will just consume much more resources for everyone > > > > > > Replying in-line to some of the points. > > > > > > > > Additional context > > > > > > - Examples of previous issues where Jetty threads have been occupied > > > and caused problems: #13666 #4756 #10619 > > > > All these examples are cases of what I have described above: an HTTP > > handler triggering a call to the same HTTP server resulting in a > > deadlock. > > > > Having had more threads wouldn't have fixed them. > > > > > - Mailing list thread about “make async” changes: > > > https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l > > > > The rationale for these changes has not much to do with Jetty but > > rather to make sure that we are not mixing blocking and non-blocking > > operations when handling these requests, which is prone to cause > > deadlocks in broker's internal threads (irrespective of any Jetty > > thread pool size). > > > > > Q: What’s the reason of setting the default value to 200? If the node > > just have one core, what will happen? > > > > > > These are threads. Jetty defaults to 200 maximum threads, to prevent > > > thread pool starvation. This is recommended when using blocking Servlet > > > API. The problem is that Pulsar uses the blocking servlet API and > > > doesn’t have a sufficient number of threads which are needed and > > > recommended. > > > > We need to keep in mind that typical usage of Jetty is as a Servlet > > container, dedicated to that. You have servlets doing disk IO reads > > and blocking calls on JDBC drivers. > > > > Pulsar usage is *very* different from that. Admin API is a tiny > > portion of what a broker does. Dedicating 200 threads to that would be > > a massive waste of resources (CPU & memory). > > > > > The value 200 doesn’t mean that there will be 200 threads to start with. > > > This is the maximum size for the thread pool. When the value is more > > > than 8, Jetty will start with 8 initial threads and add more threads to > > > the pool when all threads are occupied. > > > > > > Q: Do we need to take the number of system cores into consideration for > > the maximum threads of the thread pool? > > > > > > No. Jetty is different from Netty in this aspect. In Netty, everything > > > should be asynchronous and “thou shall never block”. In Jetty, the > > > maximum number of threads for the thread pool should be set to 50-500 > > > threads and blocking operations are fine. > > > > I disagree. In a container configured with 0.5 CPU cores, it would be > > very bad to run 200 threads, and it will have no other advantage > > whatsoever. > > > > That is the reason why the default thread pool size is pegged to the > > number of cores. > > > > > The recommendation for the thread pool is explained in Jetty > > > documentation > > > > > https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool > > > > Thread Pool > Configure with goal of limiting memory usage maximum > > > available. Typically this is >50 and <500 > > > > The recommendations for Jetty are relative to typical Jetty workloads, > > something the Pulsar broker is definitely not. > > >
