NianticRyan opened a new issue, #24098: URL: https://github.com/apache/beam/issues/24098
### What happened? Hi, I recently encountered an issue when upgrading my apache beam SDKs. I was attempting to upgrade from 2.30.0 to 2.40.0, and there was a file descriptor leak where approximately every time a new pipeline was started, java would create a new set of file descriptors to the libraries on the classpath. This resulted in many duplicate file descriptors being open and eventually hitting the maximum number of file descriptors. I was able to track it down to when I bump from 2.35.0 to 2.36.0, and I am 99% certain that [this commit](https://github.com/apache/beam/commit/970bdc0ed7142f5263e9a78fc4d715d50539e7ef) is the source. The main difference in this code is that `classGraph.disableNestedJarScanning().addClassLoader(classLoader).scan(1).getClasspathFiles();` creates a new `Scanner` object with `performScan=true`, whereas `classGraph.disableNestedJarScanning().addClassLoader(classLoader).getClasspathFiles();` does the same with `performScan=false`. `PerformScan=true` runs an asynchronoous scan of the classpath, and `PerformScan=false` just generates a placeholder ScanResult with just the classpath. The scan itself is creating the fd leak. The file leak itself is coming from the classgraph scanner. Here's one example of an open file descriptor ``` #697 /default/lib/guice-assistedinject-3.0.jar by thread:ClassGraph-worker-2 on Wed Nov 09 22:10:48 UTC 2022 at java.io.RandomAccessFile.<init>(RandomAccessFile.java:244) at nonapi.io.github.classgraph.fileslice.FileSlice.<init>(FileSlice.java:134) at nonapi.io.github.classgraph.fileslice.FileSlice.<init>(FileSlice.java:178) at nonapi.io.github.classgraph.fastzipfilereader.PhysicalZipFile.<init>(PhysicalZipFile.java:87) at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$1.newInstance(NestedJarHandler.java:93) at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$1.newInstance(NestedJarHandler.java:90) at nonapi.io.github.classgraph.concurrency.SingletonMap.get(SingletonMap.java:189) at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$4.newInstance(NestedJarHandler.java:189) at nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$4.newInstance(NestedJarHandler.java:154) at nonapi.io.github.classgraph.concurrency.SingletonMap.get(SingletonMap.java:189) at io.github.classgraph.ClasspathElementZip.open(ClasspathElementZip.java:162) at io.github.classgraph.Scanner$3.processWorkUnit(Scanner.java:595) at io.github.classgraph.Scanner$3.processWorkUnit(Scanner.java:567) at nonapi.io.github.classgraph.concurrency.WorkQueue.runWorkLoop(WorkQueue.java:246) at nonapi.io.github.classgraph.concurrency.WorkQueue.runWorkQueue(WorkQueue.java:161) at io.github.classgraph.Scanner.processWorkUnits(Scanner.java:342) at io.github.classgraph.Scanner.openClasspathElementsThenScan(Scanner.java:1047) at io.github.classgraph.Scanner.call(Scanner.java:1146) at io.github.classgraph.Scanner.call(Scanner.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Our configuration is using kubernetes to kick off a google cloud dataflow using the apache beam sdk. Please let me know if there is any other information I can provide. Is anyone else hitting a similar issue? ### Issue Priority Priority: 2 ### Issue Component Component: runner-core -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
