Github user revans2 commented on the issue: https://github.com/apache/storm/pull/2241 Issues that need to be addressed to remove `max.spout.pending` (sorry about the wall of text). 1. Worker to Worker backpressure. The netty client and server have no backpressure in them. If for some reason the network cannot keep up the netty client will continue to buffer messages internally in netty until you get GC/OOM issues like described in the design doc. We ran into this when using the ZK based backpressure instead of `max.spout.pending` on a topology with a very high throughput and there was a network glitch. 2. Single choke point getting messages off of a worker. Currently there is a single disruptor queue and a single thread that is responsible for routing all of the messages from within the worker to external workers. If any of the clients sending messages to other workers did block (backpressure) it would stop all other messages from leaving the worker. In practice this negates the "a choke up in a single bolt will not put the brakes on all the topology components" from your design. And as the number of workers in a topology grows the impact when this does happen will grow too. 3. Load aware grouping needs to be back on by default and probably improved some. Again one of the stated goals of your design for backpressure is "a choke up in a single bolt will not put the brakes on all the topology components". Most of the existing groupings effectively negate this, as each upstream component will on average send a message to all down stream components relatively frequently. Unless we can route around these backed up components one totally blocked bolt will stop all of a topology from functioning. If you say the throughput drops by 20% when this is on, then we probably want to do some profiling and understand why this happens. 4. Timer Tick Tuples. There are a number of things that rely on timer ticks. Both system critical things, like ackers and spouts timing out tuples and metrics (at least for now); and code within bolts and spouts that want to do something periodically without having a separate thread. Right now there is a single thread that does these ticks for each worker. In the past it was a real pain to try and debug failures when it would block trying to insert a message into a queue that was full. Metrics stopped flowing. Spouts stopped timing things out, and other really odd things happened, that I cannot remember off the top of my head. I don't want to have to go back to that. 5. Cycles (non DAG topologies). I know we strongly discourage users from building these types of things, and quite honestly I am happy to deprecate support for them, but we currently do support them. So if we are going to stop supporting it lets get on with deprecating it in 1.x with clear warnings etc. Otherwise when this goes in there will be angry customers with deadlocked topologies.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---