prashantwason commented on PR #18083: URL: https://github.com/apache/hudi/pull/18083#issuecomment-4029984387
@cshuo Thanks for the thoughtful follow-up. You raise a valid point about the existing `AppendWriteFunctionWithBIMBufferSort` and `AppendWriteFunctionWithDisruptorBufferSort`. Here's how continuous sorting differs from the existing async approaches: **Existing approaches (BIM/Disruptor):** - Batch sort (O(n log n)) at flush time, but move the sort+write to a background thread - Require double-buffering (2x memory) or a ring buffer - Add threading complexity (synchronization, error propagation, buffer swaps) - Sorting still happens as a single O(n log n) burst, just on a different thread **Continuous sorting (this PR):** - O(log n) per insert, no batch sort at all - Single buffer (no double-buffering overhead) - No threading complexity — simpler to reason about and debug - Incremental draining — when buffer fills, oldest sorted records are written immediately - Better for latency-sensitive workloads where predictable per-record cost is preferred over throughput The key trade-off is: - **BIM/Disruptor** optimize throughput by overlapping sort+write with ingestion (async), at the cost of memory and complexity - **Continuous sort** optimizes latency predictability by eliminating sort spikes entirely (no batch sort), at the cost of higher per-record overhead (O(log n) vs O(1) insert) These approaches are complementary — continuous sorting could potentially be combined with async write in the future. This PR adds it as an opt-in alternative via `write.buffer.sort.continuous.enabled=true`. I don't have formal benchmark numbers yet. Would it be helpful if I ran a comparison benchmark against the BIM approach to quantify the latency distribution differences? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
