prashantwason commented on PR #18083: URL: https://github.com/apache/hudi/pull/18083#issuecomment-3949598053
Thanks @cshuo for the review! **Regarding async write support:** You're correct that this PR doesn't include async write like #13892. The goal of this PR is more focused - it specifically addresses the sorting bottleneck in append write by switching from batch sorting to continuous (incremental) sorting. This eliminates the large pause times that occur when sorting a full buffer. The key benefits of continuous sorting: - **Non-blocking O(log n) inserts** vs O(n log n) batch sort at flush time - **Predictable latency** - no sort spikes when buffer fills up - **Incremental draining** - oldest entries are drained and written immediately when buffer reaches max capacity - **Reduced backpressure** - minimizes single-partition lag during ingestion **Regarding benchmarks:** I don't have formal benchmark numbers comparing to the existing batch sort approach. The primary motivation was to eliminate the unpredictable latency spikes from batch sorting rather than overall throughput improvement. That said, adding async write on top of continuous sorting would be a natural extension that could improve throughput further. Would you like me to add some basic benchmarking to quantify the latency improvements? Or do you have specific concerns about the approach that I should address first? @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
