Dear Pulsar Community, I'd like to propose PIP-442, which addresses critical memory exhaustion issues in Pulsar's topic discovery commands that can cause broker crashes in production deployments.
Problem: The current topic listing implementation lacks flow control, creating unbounded memory allocation scenarios: - OutOfMemoryError when multiple clients request large topic lists simultaneously - Proxy cascading failures due to unbuffered response forwarding - Unpredictable resource usage making capacity planning difficult - Performance degradation from GC pressure affecting all broker operations A namespace with 10K topics can consume ~1MB per response. With 1K concurrent requests, this creates ~1GB memory pressure that can crash brokers. Solution: PIP-442 introduces MaxTopicListInFlightLimiter - a memory-aware semaphore system: - Dual memory tracking for heap (topic assembly) and direct memory (network buffers) - Asynchronous flow control with configurable timeouts and queue limits - Permit-based system ensuring memory is released after response transmission - Graceful degradation instead of broker crashes Benefits: - Prevents broker crashes from topic listing memory exhaustion - Predictable resource usage for capacity planning - Maintains full backward compatibility (no client changes required) - Comprehensive monitoring with detailed metrics - Fair resource sharing through queueing mechanisms The implementation adds flow control at two key points: after topic retrieval from metadata store (heap) and before response serialization (direct memory). The full proposal can be found at: https://github.com/apache/pulsar/wiki/PIP-442 I welcome your feedback and discussion on this proposal. Please share your thoughts, concerns, or suggestions. Best regards, -Lari
