dybyte opened a new pull request, #9696:
URL: https://github.com/apache/seatunnel/pull/9696

   Fixes https://github.com/apache/seatunnel/issues/9637
   ### Purpose of this pull request
   
   Fixes three memory leak issues:
   
   1. `RunningJobStateIMap` – Checkpoint-related entries are stored but never 
removed, growing ~8,000/day.
   
   2. `pendingJobMasterMap` – Not cleaned when resource allocation fails, 
growing ~200/day.
   
   3. `metricsImap` – Cleanup skipped if lock acquisition fails, growing 
~40/day.
   
   These changes ensure proper cleanup and retry, reducing memory growth in 
production.
   This PR introduces a background cleanup worker that collects failed metrics 
removal tasks into a blocking queue and retries them periodically based on a 
new configuration option (cleanup-retry-interval).
   
   This is my first time working with the engine codebase, so I might have 
overlooked some details. I’d appreciate any feedback or suggestions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   A new configuration option is introduced:
   
   `cleanup-retry-interval` – Interval in seconds between attempts to retry 
metrics cleanup when previous cleanup fails due to lock contention or other 
issues.
   Default: 10 seconds.
   
   This helps ensure metrics cleanup eventually succeeds under heavy load.
   
   
   ### How was this patch tested?
   
   - Added E2E tests using Testcontainers.
   - Verified cleanup via server logs (direct map inspection not possible in 
this environment).
   
   - **For metricsImap cleanup retries, direct verification is challenging** 
because the test environment (Docker Testcontainers) does not allow internal 
state inspection and lock contention is non-deterministic.
   If reviewers have suggestions for reliably simulating lock contention in 
integration tests, it would be greatly appreciated.
   
   ### Check list
   
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according
     [New License 
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [x] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/seatunnel/tree/dev/docs
   * [ ] If you are contributing the connector code, please check that the 
following files are updated:
     1. Update 
[plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties)
 and add new connector information in it
     2. Update the pom file of 
[seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml)
     3. Add ci label in 
[label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml)
     4. Add e2e testcase in 
[seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/)
     5. Update connector 
[plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to