LuciferYang opened a new pull request, #10480:
URL: https://github.com/apache/gravitino/pull/10480

   ### What changes were proposed in this pull request?
   Introduce a `ClassLoaderPool` with reference counting to share 
`IsolatedClassLoader` instances across catalogs of the same type, and 
centralize ClassLoader resource cleanup into the pool's lifecycle.
   
   **Core mechanism:** Catalogs with the same provider, package property, 
authorization plugin path, and Kerberos identity share a single 
`IsolatedClassLoader`. The pool uses `ConcurrentHashMap.compute()` for atomic 
acquire/release, and performs cleanup (JDBC driver deregistration, ThreadLocal 
clearing, MySQL `AbandonedConnectionCleanupThread` shutdown) only when the last 
catalog releases the shared ClassLoader.
   
   **New classes:**
   - `ClassLoaderKey` — composite key for ClassLoader sharing
   - `ClassLoaderPool` — thread-safe pool with reference counting and lifecycle 
management
   - `PooledClassLoaderEntry` — holds a shared ClassLoader and its reference 
count
   
   **Changes to existing classes:**
   - `CatalogManager` — integrates pool into catalog creation, test connection, 
and close paths; fixes ClassLoader leak in `testConnection()` and 
`getResolvedProperties()`
   - `ClassLoaderResourceCleanerUtils` — broadens ThreadLocal cleanup from 
webserver-only to all application threads; adds MySQL cleanup
   - Removes scattered cleanup from `JdbcCatalogOperations`, 
`IcebergCatalogWrapper`, `IcebergCatalogOperations`, and 
`PaimonCatalogOperations`
   
   ### Why are the changes needed?
   Concurrent catalog creation with different names but the same provider type 
causes `OutOfMemoryError: Metaspace`. Each catalog creates an independent 
`IsolatedClassLoader` that loads all provider JARs into Metaspace. With 
`MaxMetaspaceSize=512m` (default) and Iceberg catalogs consuming ~30-80 MB 
each, ~10 catalogs exhaust the limit.
   
   This patch addresses four root causes:
   1. **No ClassLoader sharing** — same-type catalogs loaded identical classes 
into separate Metaspace regions
   2. **ClassLoader leak in `testConnection()`** — wrapper was never closed 
after connection test
   3. **Incomplete ThreadLocal cleanup** — only cleaned webserver threads, 
missing ForkJoinPool and other threads
   4. **Inconsistent cleanup** — only 2 of 9+ catalog types called 
`ClassLoaderResourceCleanerUtils`
   
   Fix: #10093
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   **Unit tests** (`TestClassLoaderPool` — 15 tests): acquire/release 
semantics, reference counting, concurrent access with 20 threads, 
close-during-acquire race, double-release resilience, Kerberos key isolation.
   
   **Integration tests** (`TestClassLoaderPoolIntegration` — 3 tests): 
same-type catalogs share ClassLoader instance, drop one doesn't affect others, 
manager close cleans up pool.
   
   **Existing tests**: `TestCatalogManager` and `TestJdbcCatalogOperations` 
pass without modification.
   
   **Benchmark** (JDK 17, `-XX:MaxMetaspaceSize=512m`, `fileset` provider, 10 
concurrent threads):
   
   Metaspace growth (committed KB):
   
   | Catalogs | Baseline (`main`) | ClassLoaderPool | Reduction |
   |---|---|---|---|
   | 100 | +890 | +261 | 3.4x |
   | 500 | +3,280 | +82 | 40x |
   | 1,000 | +6,416 | +9 | 713x |
   | 5,000 | +13,969 | +67 | 209x |
   | 10,000 | +40,394 | +11 | **3,672x** |
   
   Classes loaded:
   
   | Catalogs | Baseline | ClassLoaderPool | Reduction |
   |---|---|---|---|
   | 1,000 | 21,772 | 11,387 | 48% |
   | 10,000 | 60,373 | 11,961 | **80%** |
   
   Baseline Metaspace grows O(N) with catalog count. The pool stays flat at 
~8.7 MB — O(number of distinct keys). No OOM or performance regression on 
either version. For Iceberg catalogs (~50 MB/ClassLoader), baseline OOMs at ~10 
catalogs; with the pool, catalogs sharing the same key (same provider, package 
property, authorization plugin path, and Kerberos identity) reuse a single 
ClassLoader, so Metaspace scales with the number of distinct configurations 
rather than the number of catalog instances.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to