[GitHub] [hudi] nsivabalan commented on issue #5223: [SUPPORT] - HUDI clustering - read issues

GitBox Thu, 12 May 2022 16:21:07 -0700


nsivabalan commented on issue #5223:
URL: https://github.com/apache/hudi/issues/5223#issuecomment-1125503984


   I could not reproduce. I tried a bulk insert by setting very small parquet 
max file size. which created 1300 file groups. And then triggered another small 
commit during which I triggered clustering. before clustering, I do see from 
sql tab in spark UI, that 1300+ files are read. after clustering, I see only 5 
files are read. 
   
   <img width="629" alt="Screen Shot 2022-05-12 at 4 32 29 PM" 
src="https://user-images.githubusercontent.com/513218/168182811-53499b46-3c59-4873-9315-c598819f3a67.png";>
   <img width="655" alt="Screen Shot 2022-05-12 at 7 18 17 PM" 
src="https://user-images.githubusercontent.com/513218/168182816-71e424c4-e2df-4688-a69d-eac14e17a588.png";>
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #5223: [SUPPORT] - HUDI clustering - read issues

Reply via email to