ssomuah commented on issue #1852:
URL: https://github.com/apache/hudi/issues/1852#issuecomment-663646201


   Hi Balaji, I think I've narrowed down my issue somewhat for my MOR table. 
   
   I started again with a fresh table and the initial commits make sense, but 
after a time I notice It's consistently trying to write 300+ files. 
   
   <img width="964" alt="Screen Shot 2020-07-24 at 1 15 17 PM" 
src="https://user-images.githubusercontent.com/2061955/88417393-da14f980-cdaf-11ea-87ab-63f3aafade83.png";>
   
   <img width="1398" alt="Screen Shot 2020-07-24 at 1 15 36 PM" 
src="https://user-images.githubusercontent.com/2061955/88417402-de411700-cdaf-11ea-85dd-c10c405851d3.png";>
   
   <img width="1411" alt="Screen Shot 2020-07-24 at 1 15 52 PM" 
src="https://user-images.githubusercontent.com/2061955/88417424-e5682500-cdaf-11ea-9c4b-534e27d80c45.png";>
   
   
   The individual tasks don't take that long so I think if I could reduce the 
number of files it's trying to write it would help. 
   <img width="1409" alt="Screen Shot 2020-07-24 at 1 16 03 PM" 
src="https://user-images.githubusercontent.com/2061955/88417487-fca71280-cdaf-11ea-9fc0-10a8a074501c.png";>
   
   
   I can also see from the cli that whether it's doing a compaction or a delta 
commit I still seem to be writing the same number of files for a fraction of 
the data. 
   <img width="1307" alt="Screen Shot 2020-07-24 at 1 21 36 PM" 
src="https://user-images.githubusercontent.com/2061955/88417841-aa1a2600-cdb0-11ea-808f-d66595af91ea.png";>
   
   
   Is there something I can tune to reduce the number of files it breaks the 
data into?
   
   hoodie.logfile.max.size is 256MB
   hoodie.parquet.max.file.size is 256MB
   hoodie.parquet.compression.ratio is the default .35


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to