yittg commented on issue #3776: URL: https://github.com/apache/iceberg/issues/3776#issuecomment-998502908
After some digging, i find the direct cause is [that](https://github.com/apache/iceberg/blob/5009949ba4377ac5a8572ff7ae70e886c9e33bec/core/src/main/java/org/apache/iceberg/ManifestReader.java?#L100-L102): when we build a `AvroIterable`, we didn't set the class loader, then the default Thread#getContextClassLoader will be used. The task is running in a persist shared [thread pool](https://github.com/apache/iceberg/blob/5009949ba4377ac5a8572ff7ae70e886c9e33bec/core/src/main/java/org/apache/iceberg/util/ThreadPools.java?#L60-L62), whose context class loader will be closed in Flink TaskManager, after being used for the first time. However just setting the class loader manually is not enough, because others code may still use the context class loader, which can not be avoided. An example exception stack like following, which is caused by ServiceLoader#load in JDK, <details> <summary>at javax.xml.parsers.FactoryFinder.findServiceProvider</summary> ``` java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:159) ~[flink-dist_2.12-1.13.1.jar:1.13.1] at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResources(FlinkUserCodeClassLoaders.java:188) ~[flink-dist_2.12-1.13.1.jar:1.13.1] at java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1196) ~[?:?] at java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1221) ~[?:?] at java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265) ~[?:?] at java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300) ~[?:?] at java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385) ~[?:?] at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:287) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:?] at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:283) ~[?:?] at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:261) ~[?:?] at javax.xml.parsers.SAXParserFactory.newInstance(SAXParserFactory.java:147) ~[?:?] at org.jdom.input.JAXPParserFactory.createParser(JAXPParserFactory.java:125) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at jdk.internal.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.jdom.input.SAXBuilder.createParser(SAXBuilder.java:585) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.jdom.input.SAXBuilder.build(SAXBuilder.java:460) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.ResponseParsers.getXmlRootElement(ResponseParsers.java:1015) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.ResponseParsers.parseListObjects(ResponseParsers.java:1028) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.ResponseParsers$ListObjectsReponseParser.parse(ResponseParsers.java:562) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.ResponseParsers$ListObjectsReponseParser.parse(ResponseParsers.java:556) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:152) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.OSSOperation.doOperation(OSSOperation.java:113) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.internal.OSSBucketOperation.listObjects(OSSBucketOperation.java:421) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at com.aliyun.oss.OSSClient.listObjects(OSSClient.java:445) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystemStore.listObjects(AliyunOSSFileSystemStore.java:434) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.getFileStatus(AliyunOSSFileSystem.java:273) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.create(AliyunOSSFileSystem.java:115) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.avro.AvroFileAppender.<init>(AvroFileAppender.java:51) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.avro.Avro$WriteBuilder.build(Avro.java:198) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestWriter$V1Writer.newAppender(ManifestWriter.java:277) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestWriter.<init>(ManifestWriter.java:58) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestWriter.<init>(ManifestWriter.java:34) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestWriter$V1Writer.<init>(ManifestWriter.java:256) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestFiles.write(ManifestFiles.java:117) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.SnapshotProducer.newManifestWriter(SnapshotProducer.java:370) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.MergingSnapshotProducer$DataFileFilterManager.newManifestWriter(MergingSnapshotProducer.java:711) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestFilterManager.filterManifestWithDeletedFiles(ManifestFilterManager.java:383) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:308) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:186) ~[dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:405) [dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:71) [dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:311) [dist-1.1-SNAPSHOT.jar:1.1-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] ``` </details> So looks like we should avoid using shared thread pool across different job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
