dennishuo commented on code in PR #2290: URL: https://github.com/apache/polaris/pull/2290#discussion_r2280150574
########## runtime/service/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java: ########## @@ -984,8 +946,14 @@ public List<Catalog> listCatalogs() { /** List all catalogs without checking for permission. */ private Stream<CatalogEntity> listCatalogsUnsafe() { - return loadEntities( - PolarisEntityType.CATALOG, PolarisEntitySubType.ANY_SUBTYPE, null, CatalogEntity::of); + return metaStoreManager + .loadEntitiesAll( + getCurrentPolarisContext(), + null, + PolarisEntityType.CATALOG, + PolarisEntitySubType.ANY_SUBTYPE, + CatalogEntity::of) Review Comment: Are these type-wrappers from https://github.com/apache/polaris/pull/2261/files the only usage of `transformer` that aren't `Identity` transformations now? Alongside the `PolicyEntity.of` in `PolicyCatalog`? I guess I can see how it's a bit cleaner being able to pass these into the `metaStoreManager` and if callers don't end up doing yet another `.stream().map(...).toList()` maybe there's avoidance of a bit of memory copying between lists, but as we're also starting to clarify better in the mailing list discussion about Polaris SPIs, the `PolarisMetaStoreManager` is one critical interface boundary, and having the open-ended transformer introduces some subtle pitfalls. Previously, it was already a bit dicey having the Transformer in the `BasePersistence::listEntities` method but at least it was contained because `BasePersistence` has been de-facto package-private in its usage (i.e. IcebergCatalog/PolarisAdminService interacts with MetaStoreManager, not BasePersistence), so the blast radius has luckily remained somewhat contained. One example pitfall is when considering how this pushdown benefits from now having the ability to achieve SNAPSHOT_READ isolation for list contents, and how that translates into running inside the `runInTransaction` for the TransactionalMetaStoreManager branch, then if we're willing to run arbitrary transformer functions inside the transactional critical block we need callers to "promise" they'll only provide "trivial" transformers. Otherwise, I could imagine someday a "convenient" heavyweight transformation, such as for example, a remote-catalog entity resolver that uses a PolarisBaseEntity as just a passthrough facade, leaking in and causing problems by being within the transactional/snapshot read section. Also, right now the interfaces permit the MetaStoreManager implementation and/or BasePersistence implementation to potentially perform concurrent underlying operations if desired to parallelize i.e. sharded list results; depending on whether the filter/transformer are applied inline or as post-processing, we'd also need to know if the provided functions are thread-safe. If we *really* want to expose it, we should at least document some initial basic contraints/guidelines in the method jaavadocs: 1. Must be "lightweight" without network/IO dependencies 2. Must not cause re-entrant transactions in the database layer 3. Must be threadsafe These probably need to apply to both the Transformer and the Filter. Otherwise, my preferred approach is to change all of these `FooEntity::of` transformer use cases to simply pop them out to the caller like: return metaStoreManager .loadEntitiesAll( getCurrentPolarisContext(), null, PolarisEntityType.CATALOG, PolarisEntitySubType.ANY_SUBTYPE) // No 'transformer' arg .stream() .map(CatalogEntity::of); ########## runtime/service/src/main/java/org/apache/polaris/service/catalog/policy/PolicyCatalog.java: ########## @@ -162,34 +161,15 @@ public List<PolicyIdentifier> listPolicies(Namespace namespace, PolicyType polic } List<PolarisEntity> catalogPath = resolvedEntities.getRawFullPath(); - List<PolicyEntity> policyEntities = - metaStoreManager - .listEntities( - callContext.getPolarisCallContext(), - PolarisEntity.toCoreList(catalogPath), - PolarisEntityType.POLICY, - PolarisEntitySubType.NULL_SUBTYPE, - PageToken.readEverything()) - .getEntities() - .stream() - .map( - polarisEntityActiveRecord -> - PolicyEntity.of( - metaStoreManager - .loadEntity( - callContext.getPolarisCallContext(), - polarisEntityActiveRecord.getCatalogId(), - polarisEntityActiveRecord.getId(), - polarisEntityActiveRecord.getType()) - .getEntity())) - .filter( - policyEntity -> policyType == null || policyEntity.getPolicyType() == policyType) - .toList(); - - List<PolarisEntity.NameAndId> entities = - policyEntities.stream().map(PolarisEntity::nameAndId).toList(); - - return entities.stream() + return metaStoreManager + .loadEntitiesAll( + callContext.getPolarisCallContext(), + PolarisEntity.toCoreList(catalogPath), + PolarisEntityType.POLICY, + PolarisEntitySubType.NULL_SUBTYPE, + PolicyEntity::of) + .stream() + .filter(policyEntity -> policyType == null || policyEntity.getPolicyType() == policyType) Review Comment: This could be worth adding in a code comment here as a `// TODO` either as an optional followup task or if nothing else, calling attention to the different flavors of listing so that future code changes will put more thought into the choice of listing methods. Ideally any use of these open-ended filters will either be a short-term crutch or a proven "small" use case where we think pushdown isn't worth the complexity. Longer term we could define a more structured definition of pushdown predicates that is still extensible but communicates filter semantics down to the BasePersistence layer enough to work with different database-level indexes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org