[GitHub] [hudi] Jiezhi commented on issue #4190: [SUPPORT]0.10 cow table insert mode cannot merge small files

GitBox Thu, 02 Dec 2021 02:39:18 -0800


Jiezhi commented on issue #4190:
URL: https://github.com/apache/hudi/issues/4190#issuecomment-984504316



   > The pipeline is right, there is a bucket assigner there which would try to 
assign records to small buckets, do you insert continuously and see the file 
size ? The cleaner may clean the old small files then.
   
   I started two task to consume same Kafka topic with different group id, sink 
to different  Hudi table:
   1: Flink 1.12.2 with Hudi 0.9
   2: Flink 1.13.2 with latest Hudi version(Started at 05:49 PM for test)
   
   Two jobs pipeline are same, but the files under partition are different:
   Job one keep 13 files, and new data are merged into those files, and be 
merged into two files(because no much data)  at second day.
   Job two just add new files, and there were more than 2000 files under every 
partition.
   
   
   
   
![image](https://user-images.githubusercontent.com/3399929/144405949-d7d2e919-9094-4829-9a3d-0edbb1677668.png)
   
   
![image](https://user-images.githubusercontent.com/3399929/144405759-4317cff4-c479-4d10-861f-112e50e01382.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Jiezhi commented on issue #4190: [SUPPORT]0.10 cow table insert mode cannot merge small files

Reply via email to