yuqi1129 commented on PR #10480:
URL: https://github.com/apache/gravitino/pull/10480#issuecomment-4118733256

   > Thanks @yuqi1129 for the thorough review. Here is the status of each 
concern.
   > 
   > ### Concern 1: `ClassLoaderKey` missing backend-URI dimensions — Fixed
   > Extended the isolation key with `metastore.uris`, `jdbc-url`, and 
`fs.defaultFS`. Also added `authorization-provider` (determines which 
authorization plugin JARs are loaded).
   > 
   > The full set of built-in default isolation keys (catalog property names) 
is now:
   > 
   > Category   Property keys
   > Classpath  `package` 
([Catalog.PROPERTY_PACKAGE](https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/Catalog.java?rgh-link-date=2026-03-24T07%3A12%3A45Z#L132)),
 `authorization-provider`
   > Kerberos identity  `authentication.type`, 
`authentication.kerberos.principal`, `authentication.kerberos.keytab-uri`
   > Backend URIs       `metastore.uris`, `jdbc-url`, `fs.defaultFS`
   > These defaults cannot be removed. Operators can add more via a new server 
config:
   > 
   > ```ini
   > gravitino.catalog.classloader.isolation.extra-properties = 
custom.backend.endpoint
   > ```
   > 
   > `ClassLoaderKey` stores isolation properties as a generic `Map<String, 
String>`, decoupled from specific property names — only 
`CatalogManager.buildClassLoaderKey` needs to know which properties matter. 
This makes the pool infrastructure key-agnostic and extensible without 
modifying pool classes.
   > 
   > ### Concern 2: `FileSystem.closeAll()` can disconnect live catalogs — 
Resolved by Concern 1 fix
   > With backend URIs now in the key, catalogs pointing at different HDFS 
clusters get separate ClassLoaders. `doFinalCleanup` only runs when `refCount` 
reaches 0, so `FileSystem.closeAll()` cannot affect live catalogs sharing the 
same ClassLoader.
   > 
   > ### Concern 3: ThreadLocal cross-contamination — Not fully resolved, 
inherent trade-off
   > This is a genuine limitation of ClassLoader sharing. With per-catalog 
ClassLoaders, ThreadLocal values are naturally isolated because each 
ClassLoader loads its own copy of the class. With a shared ClassLoader, 
catalogs on the same thread can see each other's ThreadLocal state.
   > 
   > The Concern 1 fix reduces the blast radius — catalogs sharing a 
ClassLoader now have identical backend configurations, so leaked ThreadLocal 
state is less likely to cause incorrect behavior (e.g., talking to the wrong 
metastore). But it does not eliminate the problem. If a library sets a 
ThreadLocal with per-catalog state (e.g., Hive `SessionState`, Hadoop 
`SecurityContext`), cross-contamination is still possible between catalogs 
sharing the same ClassLoader on the same thread.
   > 
   > **What this PR does:**
   > 
   > * Reduces exposure by isolating catalogs with different backends into 
separate ClassLoaders (Concern 1 fix)
   > * Provides `extra-properties` config as an operational escape hatch — if a 
specific ThreadLocal issue surfaces, operators can add the relevant property 
key to force separation without a code release
   > 
   > **What this PR does not do:**
   > 
   > * It does not guarantee ThreadLocal isolation between catalogs sharing a 
ClassLoader. This is an inherent trade-off of sharing and cannot be fully 
solved at the key level.
   > 
   > **Open question for the community:** Is this trade-off acceptable given 
the Metaspace savings, or should we consider additional safeguards (e.g., a 
per-catalog opt-out property to force a dedicated ClassLoader)? Happy to 
discuss.
   > 
   > ### Concern 4: AWS SDK v2 static state — Not addressed in this PR
   > Low severity, not related to the key design. Can be addressed as a 
follow-up if this one merged.
   > 
   > ### Suggestion 3: Tests for different-backend isolation — Added
   > New tests:
   > 
   > * `testDifferentMetastoreUrisCreateDifferentEntries`
   > * `testSameMetastoreUrisShareEntry`
   > * `testDifferentJdbcUrlsCreateDifferentEntries`
   > * `testDifferentDefaultFsCreateDifferentEntries`
   > * `testKeyWithAuthorizationProvider`
   > 
   > Total: 19 unit tests + 3 integration tests.
   
   I will take time to review it again. Thanks for your quick response. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to