[GitHub] [hudi] haripriyarhp commented on issue #8217: [SUPPORT] Async compaction & ingestion performance


haripriyarhp commented on issue #8217:
URL: https://github.com/apache/hudi/issues/8217#issuecomment-1482570661


   Update: As per the suggestion given here in slack forum 
https://apache-hudi.slack.com/archives/C4D716NPQ/p1679068718111579?thread_ts=1678987926.576359&cid=C4D716NPQ
 
(https://medium.com/@simpsons/efficient-resource-allocation-for-async-table-services-in-hudi-124375d58dc),
 I tried setting  the spark_scheduler_allocation_file. Still I see not much 
improvement in performance even though it goes to seperate pools.
   The below job looking for files to compact goes to the sparkdatasourcewriter 
which is taking ~10 mins which increases the time of the streaming query
   
![Job1494](https://user-images.githubusercontent.com/109664817/227495049-d9b787a8-973f-4256-939f-a82a0c59d8fc.PNG)
   
   The other jobs go to the hoodiecompact pool
   
![Job1495](https://user-images.githubusercontent.com/109664817/227495382-61ec5f26-91a9-44d6-b166-e011663f8944.PNG)
   
![Job1497](https://user-images.githubusercontent.com/109664817/227495394-932126f3-eb9e-4db1-ad36-cc9a4c2f3f40.PNG)
   
   But still if you see the overall time taken for the queries remain as the 
same as above. 
   
![streamingqueries](https://user-images.githubusercontent.com/109664817/227495601-28b3da63-9478-4c1f-8d07-ce625afc7079.PNG)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] haripriyarhp commented on issue #8217: [SUPPORT] Async compaction & ingestion performance

Reply via email to