lklhdu opened a new issue, #7352:
URL: https://github.com/apache/hudi/issues/7352

   **Describe the problem you faced**
   
   Now I'm writing a hive sync hudi through Flink, when the sync is done, I 
find that there is a very big difference between the quantity obtained from the 
query and the quantity of the database on the source side, can you help me to 
see what is the reason?
   
   this is my code demo
   
https://github.com/NetEase/lakehouse-benchmark-ingestion/blob/master/src/main/java/com/netease/arctic/benchmark/ingestion/sink/HudiCatalogSync.java
   
   The number of this source table in the database is 300808 entries
   <img width="529" alt="image" 
src="https://user-images.githubusercontent.com/49940747/204985312-ca684cac-a0fb-4121-a7a3-d7e56ba6451e.png";>
   and this is my flink job web ui
   <img width="1216" alt="image" 
src="https://user-images.githubusercontent.com/49940747/204985405-afb10f25-08e2-4aaa-9576-694e7f4d42bd.png";>
   but when I use Trino to querying this hudi table, I found very little data
   <img width="673" alt="image" 
src="https://user-images.githubusercontent.com/49940747/204985752-764eaa47-7353-41fa-a606-119672b178a9.png";>
   
   
   **Expected behavior**
   
   I know the loss data error, so I use the 0.11 patch code from @danny0405 
   https://github.com/danny0405/hudi/tree/0.11-patch
   I think the amount of data in the hudi table after synchronization should be 
the same as the amount of data in the source table
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Flink version : 1.14.6
   
   * Hive version : 2.1.1
   
   * Hadoop version : 2.9.2
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to