chihsuan opened a new pull request, #10606:
URL: https://github.com/apache/ozone/pull/10606

   ## What changes were proposed in this pull request?
   
   **Problem.** `ozone freon dfsg` (and `dfsv`) leak one `FileSystem` instance 
per run. `HadoopBaseFreonGenerator` keeps the `FileSystem` in a `ThreadLocal` 
and closes it from `taskLoopCompleted()`, which only runs on the worker-pool 
threads. But `HadoopFsGenerator.call()` and `HadoopFsValidator.call()` also 
call `getFileSystem()` on the main (calling) thread, for `mkdirs(...)` and 
`open(...)` respectively, before launching the workers. That populates the main 
thread's `ThreadLocal`, which `taskLoopCompleted()` never visits, so one 
`FileSystem`, together with its OM/read client and the gRPC channels behind it, 
is leaked on every invocation.
   
   **Fix.** Capture the main-thread `FileSystem` in a local and close it in a 
`finally` that wraps the existing setup and `runTests(...)` body, using 
`org.apache.hadoop.hdds.utils.IOUtils.closeQuietly(...)` so a close failure 
cannot mask a real exception. The change is confined to the two generators that 
touch the FileSystem on the main thread; the worker-thread cleanup path is 
unchanged, and the dir-tree generators (which never touch the FileSystem on the 
main thread) are left alone.
   
   The high-concurrency `GrpcUtil` "Timed out gracefully shutting down 
connection ... :9858" WARN flood seen in the report comes from Ratis gracefully 
shutting down datanode write channels under load and is tracked separately in 
HDDS-15670; it is out of scope here.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-14474
   
   ## How was this patch tested?
   
   - New unit test `TestHadoopFsClientClose` drives `dfsg` and `dfsv` against a 
local temp dir using an instrumented `FileSystem` (registered via `-D 
fs.file.impl=...`) that counts initialize/close calls, then asserts every 
opened `FileSystem` is closed.
   - Verified the test fails without the fix (3 opened, 2 closed: the leaked 
main-thread instance) and passes with it.
   - Existing freon unit tests still pass: `TestContentGenerator`, 
`TestOzoneClientKeyListReader`, `TestProgressBar`.
   - `checkstyle` and `apache-rat` are clean for the `ozone-freon` module.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to