dennishuo commented on code in PR #2290:
URL: https://github.com/apache/polaris/pull/2290#discussion_r2280150574


##########
runtime/service/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java:
##########
@@ -984,8 +946,14 @@ public List<Catalog> listCatalogs() {
 
   /** List all catalogs without checking for permission. */
   private Stream<CatalogEntity> listCatalogsUnsafe() {
-    return loadEntities(
-        PolarisEntityType.CATALOG, PolarisEntitySubType.ANY_SUBTYPE, null, 
CatalogEntity::of);
+    return metaStoreManager
+        .loadEntitiesAll(
+            getCurrentPolarisContext(),
+            null,
+            PolarisEntityType.CATALOG,
+            PolarisEntitySubType.ANY_SUBTYPE,
+            CatalogEntity::of)

Review Comment:
   Are these type-wrappers from 
https://github.com/apache/polaris/pull/2261/files the only usage of 
`transformer` that aren't `Identity` transformations now? Alongside the 
`PolicyEntity.of` in `PolicyCatalog`?
   
   I guess I can see how it's a bit cleaner being able to pass these into the 
`metaStoreManager` and if callers don't end up doing yet another 
`.stream().map(...).toList()` maybe there's avoidance of a bit of memory 
copying between lists, but as we're also starting to clarify better in the 
mailing list discussion about Polaris SPIs, the `PolarisMetaStoreManager` is 
one critical interface boundary, and having the open-ended transformer 
introduces some subtle pitfalls.
   
   Previously, it was already a bit dicey having the Transformer in the 
`BasePersistence::listEntities` method but at least it was contained because 
`BasePersistence` has been de-facto package-private in its usage (i.e. 
IcebergCatalog/PolarisAdminService interacts with MetaStoreManager, not 
BasePersistence), so the blast radius has luckily remained somewhat contained.
   
   One example pitfall is when considering how this pushdown benefits from now 
having the ability to achieve SNAPSHOT_READ isolation for list contents, and 
how that translates into running inside the `runInTransaction` for the 
TransactionalMetaStoreManager branch, then if we're willing to run arbitrary 
transformer functions inside the transactional critical block we need callers 
to "promise" they'll only provide "trivial" transformers.
   
   Otherwise, I could imagine someday a "convenient" heavyweight 
transformation, such as for example, a remote-catalog entity resolver that uses 
a PolarisBaseEntity as just a passthrough facade, leaking in and causing 
problems by making slow external remote-catalog network requests within the 
transactional/snapshot read section.
   
   Also, right now the interfaces permit the MetaStoreManager implementation 
and/or BasePersistence implementation to potentially perform concurrent 
underlying operations if desired to parallelize i.e. sharded list results; 
depending on whether the filter/transformer are applied inline or as 
post-processing, we'd also need to know if the provided functions are 
thread-safe.
   
   If we *really* want to expose it, we should at least document some initial 
basic contraints/guidelines in the method jaavadocs:
   
   1. Must be "lightweight" without network/IO dependencies
   2. Must not cause re-entrant transactions in the database layer
   3. Must be threadsafe
   
   These probably need to apply to both the Transformer and the Filter.
   
   Otherwise, my preferred approach is to change all of these `FooEntity::of` 
transformer use cases to simply pop them out to the caller like:
   
       return metaStoreManager
           .loadEntitiesAll(
               getCurrentPolarisContext(),
               null,
               PolarisEntityType.CATALOG,
               PolarisEntitySubType.ANY_SUBTYPE)  // No 'transformer' arg
          .stream()
          .map(CatalogEntity::of);



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to