prashantwason opened a new pull request, #18083:
URL: https://github.com/apache/hudi/pull/18083

   ### Describe the issue this Pull Request addresses
   
   This PR implements a continuous sorting feature for the Flink append write 
sink. The continuous sorting mode maintains sorted order incrementally using a 
TreeMap, which avoids large pause times from batch sorting and reduces 
single-partition lag during ingestion by minimizing backpressure.
   
   ### Summary and Changelog
   
   **Summary:** Added a continuous sorting mode 
(`AppendWriteFunctionWithContinuousSort`) that provides non-blocking O(log n) 
inserts and incremental draining, offering predictable latency without sort 
spikes.
   
   **Changelog:**
   - Added `AppendWriteFunctionWithContinuousSort` class that keeps records in 
a TreeMap keyed by a code-generated normalized key and an insertion sequence
   - When buffer reaches max capacity, oldest entries are drained and written 
immediately
   - Updated `AppendWriteFunctions.create` to instantiate the continuous sorter 
when `WRITE_BUFFER_SORT_CONTINUOUS_ENABLED` is true
   - Introduced new FlinkOptions:
     - `write.buffer.sort.continuous.enabled` - Whether to use continuous 
sorting instead of batch sorting
     - `write.buffer.sort.continuous.drain.size` - Number of records to drain 
each time max capacity is reached
   - Added `ITTestAppendWriteFunctionWithContinuousSort` integration tests 
covering buffer flush triggers, sorted output correctness, drain behaviors, and 
invalid-parameter error cases
   
   ### Impact
   
   **New Configuration Options:**
   - `write.buffer.sort.continuous.enabled` (default: `false`) - Enables 
continuous sorting mode
   - `write.buffer.sort.continuous.drain.size` (default: `1`) - Controls drain 
batch size
   
   No breaking changes to existing functionality. The feature is disabled by 
default.
   
   ### Risk Level
   
   low - This is a new optional feature that is disabled by default. Existing 
behavior is unchanged unless the user explicitly enables continuous sorting.
   
   ### Documentation Update
   
   The config description is included in the code. Documentation update for the 
Hudi website may be needed to describe the new continuous sorting feature and 
its configuration options.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to