aglinxinyuan opened a new issue, #5548:
URL: https://github.com/apache/texera/issues/5548

   ## Background
   
   While reviewing #5447, Yicong-Huang flagged that the `Files.walk(...)` 
stream-leak pattern Copilot caught in my test cleanup helpers may exist 
elsewhere. A sweep of the codebase confirmed one production usage that leaks 
the underlying directory handle:
   
   
`common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/util/dataset/GitVersionControlLocalFileStorage.java`,
 lines 80-85:
   
   ```java
   public static void deleteRepo(Path directoryPath) throws IOException {
       Files.walk(directoryPath)
           .sorted(Comparator.reverseOrder())
           .map(Path::toFile)
           .forEach(File::delete);
   }
   ```
   
   `Files.walk(...)` returns a closeable `java.util.stream.Stream` backed by an 
open directory handle. Without an explicit `close()`, the handle stays open 
until GC — which can flake temp-dir deletion on Windows and leak file 
descriptors on long-lived JVMs (e.g. the dataset service that calls 
`deleteRepo` whenever a Git-backed dataset is removed).
   
   Every other `Files.walk` / `Files.list` usage in the codebase already wraps 
the stream in `try/finally` and closes it (verified across `PveManager.scala`, 
`HuggingFaceModelResource.scala`, `HuggingFaceModelResourceSpec.scala`).
   
   ## What needs to change
   
   Convert `deleteRepo` to use try-with-resources so the stream is closed even 
if iteration throws:
   
   ```java
   public static void deleteRepo(Path directoryPath) throws IOException {
       try (var stream = Files.walk(directoryPath)) {
           stream
               .sorted(Comparator.reverseOrder())
               .map(Path::toFile)
               .forEach(File::delete);
       }
   }
   ```
   
   No behavior change for callers; just the stream-lifecycle fix.
   
   ## Scope
   
   - Single-method edit in `GitVersionControlLocalFileStorage.java`.
   - No new tests required (existing dataset deletion paths already exercise 
this code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to