Ah it's much better ;) Thanks :)
Regards JB On Thu, Sep 4, 2025 at 4:55 PM artur rakhmatulin <[email protected]> wrote: > > My apology. > > Proposed changes: > Add an additional index idx_entities_lookup to the Entities metastore table. > > For Postgresql metastore: > CREATE INDEX IF NOT EXISTS idx_entities_lookup > ON entities USING btree (realm_id, catalog_id, parent_id, type_code, > sub_type_code) > INCLUDE (id, name); > For H2 metastore: > CREATE INDEX IF NOT EXISTS idx_entities_lookup > ON entities (realm_id, catalog_id, parent_id, type_code, sub_type_code, > id, name); > > Background: > Original request [1] > Previous PR with changes [2] > > Existing indexes > * Idx_entities (realm_id, catalog_id, id) > * Idx_locations (realm_id, parent_id, location_without_scheme) > > Where Entities is used: > Entities is the main table for storing all primary objects. > > Descendants of org.apache.polaris.core.entity.PolarisEntityCore: > * CatalogEntity (org.apache.polaris.core.entity) > * CatalogRoleEntity (org.apache.polaris.core.entity) > * GenericTableEntity (org.apache.polaris.core.entity.table) > * IcebergTableLikeEntity (org.apache.polaris.core.entity.table) > * NamespaceEntity (org.apache.polaris.core.entity) > * PolarisBaseEntity (org.apache.polaris.core.entity) > * PolarisEntity (org.apache.polaris.core.entity) > * PolicyEntity (org.apache.polaris.core.policy) > * PrincipalEntity (org.apache.polaris.core.entity) > * PrincipalRoleEntity (org.apache.polaris.core.entity) > * TableLikeEntity (org.apache.polaris.core.entity.table) > * TaskEntity (org.apache.polaris.core.entity) > > And potentially, this table is under a heavy load from all CRUD operations. > > Potential improvements: > As my experiment showed, an additional index significantly helps reduce the > query SCORE for listEntities[a] by using an Index Only Scan instead of a > Bitmap Heap Scan, which could potentially have a positive impact on > reducing the load on the DBMS during read operations. > > The experiment: > I conducted a simple experiment and measured performance using EXPLAIN for > the postgresql metastore; > The original comment: [3] > > Potential risks: > I conducted my experiments on only 1 thread and only on one read operation, > and possible additional loads on re-indexing operations during intensive > write operations were not taken into account. Such cases are possible > because the Entities table is used for all objects and roles. > > Alternatives: > The alternative proposed by Dmitri Bourlatchkov is to delegate the decision > to add the index to the administrators of the Polaris deployment. This is > because each product is unique, and the resource maintainers should > determine the optimal configuration for their needs. > > Recommendations can be placed in the metastore configuration section: [4] > > [1]: https://github.com/apache/polaris/issues/2352 > [2]: https://github.com/apache/polaris/pull/2465 > [3]: https://github.com/apache/polaris/pull/2465#issuecomment-3244547998 > [4]: > https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration > > Regards, Artur > > The experiment details: > ---- > Test ENV > OS: `24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; > root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64` > Storage: Docker image `postgres:17` > Rows in total = 899040. > Expected rows result = 200005. > > Results > Main build version > EXPLAIN ANALYZE > (SELECT id, catalog_id, parent_id, type_code, name, entity_version, > sub_type_code, create_timestamp, drop_timestamp, purge_timestamp, > to_purge_timestamp, last_update_timestamp, properties, internal_properties, > grant_records_version, location_without_scheme > FROM POLARIS_SCHEMA.ENTITIES > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND realm_id > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6); > > Bitmap Heap Scan on entities (cost=3960.68..22260.05 rows=44861 width=140) > (actual time=16.287..43.851 rows=200005 loops=1) > Recheck Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > '6911759911282783572'::bigint) AND (parent_id = > '6911759911282783572'::bigint) AND (type_code = 6)) > Filter: (sub_type_code = 0) > Heap Blocks: exact=3848 > -> Bitmap Index Scan on constraint_name (cost=0.00..3949.47 rows=44861 > width=0) (actual time=15.792..15.792 rows=200005 loops=1) > Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > '6911759911282783572'::bigint) AND (parent_id = > '6911759911282783572'::bigint) AND (type_code = 6)) > Planning Time: 0.239 ms > Execution Time: 49.253 ms > > New build version > CREATE INDEX IF NOT EXISTS idx_entities_lookup > ON entities USING btree (realm_id, catalog_id, parent_id, type_code, > sub_type_code) > INCLUDE (id, name); > > EXPLAIN ANALYZE > (SELECT id, catalog_id, parent_id, type_code, name, sub_type_code > FROM POLARIS_SCHEMA.ENTITIES > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND realm_id > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6); > > Index Only Scan using idx_entities_lookup on entities (cost=0.42..2975.89 > rows=42744 width=50) (actual time=0.030..29.235 rows=200005 loops=1) > Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > '6911759911282783572'::bigint) AND (parent_id = > '6911759911282783572'::bigint) AND (type_code = 6) AND (sub_type_code = 0)) > Heap Fetches: 0 > Planning Time: 0.089 ms > Execution Time: 35.317 ms > > Conclusion > Using `Index Only Scan` instead of `Bitmap Heap Scan` helps: > - reduce query execution time up to 28%; > - reducing query cost up to 87%; > ----
