Hi Artur, I don't know what email client you are using, but your message is not well rendered on the mailing list (almost impossible to read :)).
Regards JB On Thu, Sep 4, 2025 at 4:35 PM artur rakhmatulin <[email protected]> wrote: > > *Proposed changesAdd an additional index idx_entities_lookup to the > Entities metastore table.For Postgresql metastore:CREATE INDEX IF NOT > EXISTS idx_entities_lookup ON entities USING btree (realm_id, > catalog_id, parent_id, type_code, sub_type_code) INCLUDE (id, name);For > H2 metastore:CREATE INDEX IF NOT EXISTS idx_entities_lookup ON entities > (realm_id, catalog_id, parent_id, type_code, sub_type_code, id, > name);BackgroundOriginal request - > https://github.com/apache/polaris/issues/2352 > <https://github.com/apache/polaris/issues/2352>Previous PR with changes - > https://github.com/apache/polaris/pull/2465 > <https://github.com/apache/polaris/pull/2465>Existing indexes - > Idx_entities (realm_id, catalog_id, id)- Idx_locations (realm_id, > parent_id, location_without_scheme)Where Entities is usedEntities is the > main table for storing all primary objects.Descendants of > org.apache.polaris.core.entity.PolarisEntityCore: - CatalogEntity > (org.apache.polaris.core.entity)- CatalogRoleEntity > (org.apache.polaris.core.entity)- GenericTableEntity > (org.apache.polaris.core.entity.table)- IcebergTableLikeEntity > (org.apache.polaris.core.entity.table)- NamespaceEntity > (org.apache.polaris.core.entity)- PolarisBaseEntity > (org.apache.polaris.core.entity)- PolarisEntity > (org.apache.polaris.core.entity)- PolicyEntity > (org.apache.polaris.core.policy)- PrincipalEntity > (org.apache.polaris.core.entity)- PrincipalRoleEntity > (org.apache.polaris.core.entity)- TableLikeEntity > (org.apache.polaris.core.entity.table)- TaskEntity > (org.apache.polaris.core.entity)And potentially, this table is under a > heavy load from all CRUD operations.Potential improvementsAs my experiment > showed, an additional index significantly helps reduce the query SCORE for > listEntities by using an Index Only Scan instead of a Bitmap Heap Scan, > which could potentially have a positive impact on reducing the load on the > DBMS during read operations.The experimentI conducted a simple experiment > and measured performance using EXPLAIN for the postgresql metastore; The > original > comment:https://github.com/apache/polaris/pull/2465#issuecomment-3244547998 > <https://github.com/apache/polaris/pull/2465#issuecomment-3244547998>Test > ENVOS: `24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; > root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64`Storage: Docker image > `postgres:17`Rows in total = 899040.Expected rows result = > 200005.ResultsMain build versionEXPLAIN ANALYZE(SELECT id, catalog_id, > parent_id, type_code, name, entity_version, sub_type_code, > create_timestamp, drop_timestamp, purge_timestamp, to_purge_timestamp, > last_update_timestamp, properties, internal_properties, > grant_records_version, location_without_scheme FROM > POLARIS_SCHEMA.ENTITIES WHERE catalog_id = 6911759911282783572 AND > sub_type_code = 0 AND realm_id = 'POLARIS' AND parent_id = > 6911759911282783572 AND type_code = 6);Bitmap Heap Scan on entities > (cost=3960.68..22260.05 rows=44861 width=140) (actual time=16.287..43.851 > rows=200005 loops=1) Recheck Cond: ((realm_id = 'POLARIS'::text) AND > (catalog_id = '6911759911282783572'::bigint) AND (parent_id = > '6911759911282783572'::bigint) AND (type_code = 6)) Filter: (sub_type_code > = 0) Heap Blocks: exact=3848 -> Bitmap Index Scan on constraint_name > (cost=0.00..3949.47 rows=44861 width=0) (actual time=15.792..15.792 > rows=200005 loops=1) Index Cond: ((realm_id = 'POLARIS'::text) AND > (catalog_id = '6911759911282783572'::bigint) AND (parent_id = > '6911759911282783572'::bigint) AND (type_code = 6))Planning Time: 0.239 > msExecution Time: 49.253 msNew build versionCREATE INDEX IF NOT EXISTS > idx_entities_lookup ON entities USING btree (realm_id, catalog_id, > parent_id, type_code, sub_type_code) INCLUDE (id, name);EXPLAIN > ANALYZE(SELECT id, catalog_id, parent_id, type_code, name, > sub_type_code FROM POLARIS_SCHEMA.ENTITIES WHERE catalog_id = > 6911759911282783572 AND sub_type_code = 0 AND realm_id = 'POLARIS' AND > parent_id = 6911759911282783572 AND type_code = 6);Index Only Scan using > idx_entities_lookup on entities (cost=0.42..2975.89 rows=42744 width=50) > (actual time=0.030..29.235 rows=200005 loops=1) Index Cond: ((realm_id = > 'POLARIS'::text) AND (catalog_id = '6911759911282783572'::bigint) AND > (parent_id = '6911759911282783572'::bigint) AND (type_code = 6) AND > (sub_type_code = 0)) Heap Fetches: 0Planning Time: 0.089 msExecution Time: > 35.317 msConclusionUsing `Index Only Scan` instead of `Bitmap Heap Scan` > helps:- reduce query execution time up to 28%;- reducing query cost up to > 87%;Potential risksI conducted my experiments on only 1 thread and only on > one read operation, and possible additional loads on re-indexing operations > during intensive write operations were not taken into account. Such cases > are possible because the Entities table is used for all objects and > roles.AlternativesThe alternative proposed by Dmitri Bourlatchkov > <[email protected]> is to delegate the decision to add the > index to the administrators of the Polaris deployment. This is because each > product is unique, and the resource maintainers should determine the > optimal configuration for their needs.Recommendations can be placed in the > metastore configuration > section:https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration > <https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration>* > Regards, Artur
