MonkeyCanCode opened a new pull request, #14590: URL: https://github.com/apache/iceberg/pull/14590
# Summary Fix memory leak in Spark from `AuthSessionCache` when using Iceberg and ensure resources get cleanup. # Background I am using Spark Connect where end-users will be submitting their spark jobs/queries from their end into the remote Spark Connect server. These queries runtime can ranges from seconds to minutes and query per users can varies as well. Also, this in case, the end-users are the ones who are creating spark session and defined the connection info to Iceberg REST catalog. By default, Spark Connect server will cleanup idle sessions after one hour. What I found out interesting is the memory usage of Spark Connect is not able to get garbage collected after Spark Connect server killed the idle sessions after reached default TTL. After some debugging, this point me to `ClassLoader` from Apache Spark leak in `AuthSessionCache.java` from Apache Iceberg. # Changes 1. Fixing the `ClassLaoder` leak in Apache Spark in `AuthSessionCache.java` The existed `ThreadPools.newExitingWorkerPool` created a `ScheduledExecutorService` and registers a JVM-level shutdown hook. This hook can inadvertently hold a strong reference to session specific `ClassLoader` in Spark connect via the tasks it manages, which preventing them from being released. This change replaces `newExitingWorkerPool` with `newScheduledPool` which creates a thread pool with daemon threads. Based on my understanding, daemon threads do not block JVM from existing thus prevent the issue mentioned above. 2. Ensure proper resources cleanup in catalogs `CachingCatalog` and `SparkCatalog` now implements `java.io.Closeable` which allows them to propagate the `close` call to the underlying wrapped catalog. This will ensure that any resource referenced by catalogs are properly released. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
