prashantwason commented on PR #18083:
URL: https://github.com/apache/hudi/pull/18083#issuecomment-3949598053

   Thanks @cshuo for the review!
   
   **Regarding async write support:**
   You're correct that this PR doesn't include async write like #13892. The 
goal of this PR is more focused - it specifically addresses the sorting 
bottleneck in append write by switching from batch sorting to continuous 
(incremental) sorting. This eliminates the large pause times that occur when 
sorting a full buffer.
   
   The key benefits of continuous sorting:
   - **Non-blocking O(log n) inserts** vs O(n log n) batch sort at flush time
   - **Predictable latency** - no sort spikes when buffer fills up
   - **Incremental draining** - oldest entries are drained and written 
immediately when buffer reaches max capacity
   - **Reduced backpressure** - minimizes single-partition lag during ingestion
   
   **Regarding benchmarks:**
   I don't have formal benchmark numbers comparing to the existing batch sort 
approach. The primary motivation was to eliminate the unpredictable latency 
spikes from batch sorting rather than overall throughput improvement. That 
said, adding async write on top of continuous sorting would be a natural 
extension that could improve throughput further.
   
   Would you like me to add some basic benchmarking to quantify the latency 
improvements? Or do you have specific concerns about the approach that I should 
address first?
   
   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to