RussellSpitzer opened a new pull request, #16841:
URL: https://github.com/apache/iceberg/pull/16841

   ## Why
   
   Some test harnesses (and other host runtimes) build multiple 
`InMemoryCatalog` objects that should logically refer to the same catalog — for 
example, a runtime that constructs a separate validation catalog alongside its 
primary one, or that clones a session and rebuilds its catalog manager. With 
the existing per-instance state, those objects don't see each other's writes.
   
   This PR adds a way for two instances to share a single in-memory backing 
store within the JVM.
   
   ## What
   
   - Add catalog property `in-memory-catalog.instance-id`. When two 
`InMemoryCatalog` instances are initialized with the same value, they share 
namespaces, tables, views, and the synchronization monitor used for writes. 
When unset, each instance keeps a private store as before.
   - Move per-instance maps into a private `Storage` holder. Hold a `volatile 
Storage` reference so re-initialization is safe to publish.
   - Lock on the `Storage` instance instead of `this`, so all instances 
pointing at the same shared store serialize through the same monitor 
(previously each instance synchronized on its own monitor — a real bug for any 
shared-state usage).
   - `initialize()` is now idempotent: each call selects storage based on 
whether `INSTANCE_ID` is present.
   - `close()` clears the maps only when the instance owns a private store; for 
shared stores it leaves state intact so other instances continue to observe it. 
The new behavior is documented on `close()`.
   - Add a package-private `clearInstanceState(String)` (annotated 
`@VisibleForTesting`) for explicit shared-store cleanup. It is intentionally 
not part of the public surface.
   
   ## Tests
   
   New tests in `TestInMemoryCatalog`:
   - `instanceIdSharesNamespacesAndTablesAcrossInstances` — bidirectional 
sharing of namespaces and tables.
   - `instanceIdSharesViewsAcrossInstances` — bidirectional sharing of views.
   - `distinctInstanceIdsKeepStateIsolated` — different ids stay isolated.
   - `instancesWithoutInstanceIdKeepStateIsolated` — baseline; two catalogs 
with no id remain isolated.
   - `closeClearsPrivateStateButPreservesSharedState` — pins down the new 
`close()` semantics in both modes.
   - `clearInstanceStateRemovesSharedStore` — contract test for the test hook.
   - `clearInstanceStateIsNoOpForUnknownId` — clearing an unregistered id is 
safe.
   - `initializeIsIdempotentWhenInstanceIdRemoved` — re-initializing without an 
id switches to a fresh private store.
   - `concurrentInitializeWithSameInstanceIdSharesStorage` — racing 
initializers converge on a single shared `Storage`.
   
   The existing `CatalogTests` suite (run via `TestInMemoryCatalog`) is 
unchanged and continues to pass.
   
   ## Compatibility
   
   - API: only additions (one public property constant, no method signature 
changes). `revapi` is clean.
   - Behavior: `close()` semantics change is gated on the new property — 
instances that don't set `INSTANCE_ID` behave exactly as before.
   
   ## Validation
   
   - `./gradlew :iceberg-core:test` passes (full module suite, ~20 min)
   - `./gradlew :iceberg-core:revapi` clean
   - `./gradlew :iceberg-core:spotlessCheck` clean
   
   ## Follow-up
   
   A separate PR will use this property in the Spark v4.1 test base to share a 
backing store between Spark's catalog and the validation catalog (current 
draft: [#16839](https://github.com/apache/iceberg/pull/16839); will be rebased 
on this once it merges).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to