azagrebin opened a new pull request #12980:
URL: https://github.com/apache/flink/pull/12980


   `UnsafeMemoryBudget#verifyEmpty`, called on slot freeing, needs time to wait 
on GC of all allocated/released managed memory. If there are a lot of segments 
to GC then it can take time to finish the check. If slot freeing happens in RPC 
thread, the GC waiting can block it and TM risks to miss its heartbeat.
   
   Another problem is that after 
`UnsafeMemoryBudget#RETRIGGER_GC_AFTER_SLEEPS`, `System.gc()` is called for 
each attempt to run a cleaner even if there are already detected cleaners to 
run. This leads to triggering a lot of unnecessary GCs in background.
   
   The PR offloads the verification into a separate thread and calls 
`System.gc()` only if memory cannot be reserved and there are still no cleaners 
to run after long waiting. The timeout for normal memory reservation is 
increased to 2 second. The full reservation, used for verification, gets 2 
minute timeout.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to