Toroidals commented on issue #13000:
URL: https://github.com/apache/hudi/issues/13000#issuecomment-2739447332

   > Can you try to read the table with other engines like Flink, to see if it 
is a engine specific merging behavior diff. And in your test, ensure there is 
no shuffle between the Kafka source and hudi sink pipeline.
   
   
![Image](https://github.com/user-attachments/assets/cf9547c1-dc3f-4a26-9ee6-343936a656aa)
   Using Flink, the query results are the same. I have ensured that there is no 
random shuffling between the Kafka source and the Hudi sink pipeline. It simply 
converts JSON to RowData and writes it into Hudi.
   
   Could you help me locate the code in Flink where:
   
   Pre-aggregation occurs before writing to Hudi.
   Log files are merged into base files during compaction.
   If I print out the result of each merge, it would help me better pinpoint 
the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to