[
https://issues.apache.org/jira/browse/SPARK-54367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yong Zheng updated SPARK-54367:
-------------------------------
Description:
Spark Connect server is leaking `SparkSession` objects each time a client
connects and disconnects when dealing with Apache Iceberg ([Apache Iceberg
PR|https://github.com/apache/iceberg/pull/14590]).
The `SessionHolder.close()` method in Spark Connect is responsible for cleaning
up a session. It does perform some cleanup such as artifacts and streaming
queries but it doesn't perform cleanup on the main `SessionState`. This is
where the `CatalogManager` lives which holds reference to `RESTCatalog` and
`S3FileIO`. Since the `SessionState` is never closed, these `Closeable`
catalogs are never closed and their threads leak.
To fix this, I added the following changes:
1. [Apache Iceberg PR|https://github.com/apache/iceberg/pull/14590] to
implement closable catalog
2. Make `CatalogPlugin` interface extends `java.io.Closeable`
3. Implement a `close()` method in `CatalogManager` that iterates through all
registered catalogs and calls their `close()` method
4. Make `SessionState` implements `Closeable` and calls
`catalogManager.close()` from its `close()` method.
5. Invoke `session.sessionState.close()` from `SessionHolder.close()` when a
Spark Connect session is stopped
Above changes create a clean lifecycle for catalogs when a session ended, a
`close()` call is propagated down the chain which allow each catalog to release
its resources.
PR: https://github.com/apache/spark/pull/53078
was:
Spark Connect server is leaking `SparkSession` objects each time a client
connects and disconnects when dealing with Apache Iceberg ([Apache Iceberg
PR|https://github.com/apache/iceberg/pull/14590]).
The `SessionHolder.close()` method in Spark Connect is responsible for cleaning
up a session. It does perform some cleanup such as artifacts and streaming
queries but it doesn't perform cleanup on the main `SessionState`. This is
where the `CatalogManager` lives which holds reference to `RESTCatalog` and
`S3FileIO`. Since the `SessionState` is never closed, these `Closeable`
catalogs are never closed and their threads leak.
To fix this, I added the following changes:
1. [Apache Iceberg PR|https://github.com/apache/iceberg/pull/14590] to
implement closable catalog
2. Make `CatalogPlugin` interface extends `java.io.Closeable`
3. Implement a `close()` method in `CatalogManager` that iterates through all
registered catalogs and calls their `close()` method
4. Make `SessionState` implements `Closeable` and calls
`catalogManager.close()` from its `close()` method.
5. Invoke `session.sessionState.close()` from `SessionHolder.close()` when a
Spark Connect session is stopped
Above changes create a clean lifecycle for catalogs when a session ended, a
`close()` call is propagated down the chain which allow each catalog to release
its resources.
> Propagate close() from SessionState to CatalogPlugins to prevent resource
> leaks
> -------------------------------------------------------------------------------
>
> Key: SPARK-54367
> URL: https://issues.apache.org/jira/browse/SPARK-54367
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 4.0.1
> Reporter: Yong Zheng
> Priority: Major
>
> Spark Connect server is leaking `SparkSession` objects each time a client
> connects and disconnects when dealing with Apache Iceberg ([Apache Iceberg
> PR|https://github.com/apache/iceberg/pull/14590]).
> The `SessionHolder.close()` method in Spark Connect is responsible for
> cleaning up a session. It does perform some cleanup such as artifacts and
> streaming queries but it doesn't perform cleanup on the main `SessionState`.
> This is where the `CatalogManager` lives which holds reference to
> `RESTCatalog` and `S3FileIO`. Since the `SessionState` is never closed, these
> `Closeable` catalogs are never closed and their threads leak.
> To fix this, I added the following changes:
> 1. [Apache Iceberg PR|https://github.com/apache/iceberg/pull/14590] to
> implement closable catalog
> 2. Make `CatalogPlugin` interface extends `java.io.Closeable`
> 3. Implement a `close()` method in `CatalogManager` that iterates through all
> registered catalogs and calls their `close()` method
> 4. Make `SessionState` implements `Closeable` and calls
> `catalogManager.close()` from its `close()` method.
> 5. Invoke `session.sessionState.close()` from `SessionHolder.close()` when a
> Spark Connect session is stopped
> Above changes create a clean lifecycle for catalogs when a session ended, a
> `close()` call is propagated down the chain which allow each catalog to
> release its resources.
> PR: https://github.com/apache/spark/pull/53078
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]