suryaprasanna opened a new issue, #18366:
URL: https://github.com/apache/hudi/issues/18366

   ### Feature Description
   
   **What the feature achieves:**
   Add support in HoodieIncrSource to return complete latest-state rows for 
modified records across the requested commit range, or expose an option that 
enables this behavior.
   
   **Why this feature is needed:**
   HoodieIncrSource currently reads upstream Hudi tables through the normal 
incremental reader path. Apache Hudi supports incremental read formats such as 
latest_state and cdc at the datasource level, but 
HoodieStreamer/HoodieIncrSource does not provide a way to return a complete 
latest-state view for modified records when sparse updates span multiple 
commits.
   
   This becomes a practical problem when a source table changes from COW to 
MOR. With COW, incremental reads effectively provide the latest state of 
changed records, but with MOR the same downstream pipeline may only receive 
changes within the incremental commit window and may miss values from earlier 
commits for sparse updates. That means downstream or target datasets need extra 
merge logic after the table-type change, which is especially difficult to roll 
out in large data lakes.`
   
   ### User Experience
   
   **How users will use this feature:**
   - Configuration changes needed
   - API changes
   - Usage examples
   
   
   ### Hudi RFC Requirements
   
   **RFC PR link:** (if applicable)
   
   **Why RFC is/isn't needed:**
   - Does this change public interfaces/APIs? (Yes/No)
   - Does this change storage format? (Yes/No)
   - Justification:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to