sgcisco commented on issue #11118:
URL: https://github.com/apache/hudi/issues/11118#issuecomment-2086534021

   @ad1happy2go thanks for your reply. We tried `compact num.delta commits as 
1` in one of the tests for other runs and in what try to use now it is a 
default value which is 5.
   
   As another test attempt we tried to run a pipeline over several days but 
with lower ingestion rate 600Kb/s and the same Hudi and Spark configuration as 
above.
   
   The most time consuming stage is `Building workload profile` which takes 2.5 
- 12 min, with average around 7 min.
   
   ![Screenshot 2024-04-30 at 19 44 
00](https://github.com/apache/hudi/assets/168409126/ceb6353a-b90f-4abd-8111-5477338701d5)
   
   ![Screenshot 2024-04-30 at 20 37 
15](https://github.com/apache/hudi/assets/168409126/03b7fe99-7eba-4a24-b4b6-446a6b527c67)
   
   So in this case it is around 35-40Mb per minute, current Structured 
Streaming minibatch, and workers can go up to 35Gb and 32 cores. 
   Does it look as a sufficient resource config for Hudi to handle such load?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to