eolivelli opened a new issue, #17952:
URL: https://github.com/apache/pulsar/issues/17952

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   Currently (Pulsar 2.10) there is one problem when you have many 
subscriptions and consumers (even 20.000 consumers) and partitions.
   It is very easy to push the Broker to OutOfMemory (Netty Direct Memory) 
under that condition if you don't configure rate limiting on the dispatch of 
the messages.
   
   The OODM happens because the broker is trying to READ too much data from the 
Bookies,
   in fact  the broker crashes because there is nothing that protects the broker
   from running out-of-memory while reading.
   
   Some points:
   - Subscriptions are independent from each other (think about  
dispatcherMaxReadBatchSize, dispatcherMaxReadSizeBytes, 
dispatcherMaxRoundRobinBatchSize)
   - Every subscription pumps data from the bookies
   - There is no global cap on the data to read from BK
   - When the data arrives to the Broker,  it is too late, because Netty tries 
to allocate a DirectByteBuf and then crashes
   - The more bookies you have the more efficient is the transfer, so the 
problem grows
   
   Additional considerations:
   On the Read path there is not way to know how the data that comes from BK 
will be retained
   This is different from maxMessagePublishBufferSizeInMB  because when you 
write you know that the data will go to the BK client and then it will be 
finally released.
   
   I believe that the hearth of the problem is to prevent the Broker to send 
reads that would lead to the OODM.
   It is not easy because as all the subscriptions and topics are independent 
we would need some “fair” mechanism that would allow each subscription to make 
progress.
   
   I have a very simple reproducer for this theory, it breaks both on the big 
testing  clusters (with many brokers and bookies)  and with Pulsar standalone.
   
   #create a partitioned topic
   bin/pulsar-admin topics create-partitioned-topic -p 10 test
   
   # create the 64 subscriptions
   bin/pulsar-perf consume -ns 64 -n 10 test -st Shared
   (Ctrl-C)
   
   # create a backlog with big non-batched messages
   bin/pulsar-perf produce -s 1000000 -bb 0 -bm 0 -r 1000 test
   
   # break the broker by starting all the consumers all together
   bin/pulsar-perf consume -ns 64 -n 10 test -st Shared
   
   
   ### Solution
   
   One solution is to put a hard limit on the size of pending reads from 
storage.
   We can estimate the size of a upcoming "pending read" but using some 
statistics on the size of the most recently read entries from the topic.
   The limit must be per-broker.
   
   
   
   ### Alternatives
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to