*Proposed changesAdd an additional index idx_entities_lookup to the
Entities metastore table.For Postgresql metastore:CREATE INDEX IF NOT
EXISTS idx_entities_lookup    ON entities USING btree (realm_id,
catalog_id, parent_id, type_code, sub_type_code)    INCLUDE (id, name);For
H2 metastore:CREATE INDEX IF NOT EXISTS idx_entities_lookup    ON entities
(realm_id, catalog_id, parent_id, type_code, sub_type_code, id,
name);BackgroundOriginal request -
https://github.com/apache/polaris/issues/2352
<https://github.com/apache/polaris/issues/2352>Previous PR with changes -
https://github.com/apache/polaris/pull/2465
<https://github.com/apache/polaris/pull/2465>Existing indexes -
Idx_entities (realm_id, catalog_id, id)- Idx_locations (realm_id,
parent_id, location_without_scheme)Where Entities is usedEntities is the
main table for storing all primary objects.Descendants of
org.apache.polaris.core.entity.PolarisEntityCore: - CatalogEntity
(org.apache.polaris.core.entity)- CatalogRoleEntity
(org.apache.polaris.core.entity)- GenericTableEntity
(org.apache.polaris.core.entity.table)- IcebergTableLikeEntity
(org.apache.polaris.core.entity.table)- NamespaceEntity
(org.apache.polaris.core.entity)- PolarisBaseEntity
(org.apache.polaris.core.entity)- PolarisEntity
(org.apache.polaris.core.entity)- PolicyEntity
(org.apache.polaris.core.policy)- PrincipalEntity
(org.apache.polaris.core.entity)- PrincipalRoleEntity
(org.apache.polaris.core.entity)- TableLikeEntity
(org.apache.polaris.core.entity.table)- TaskEntity
(org.apache.polaris.core.entity)And potentially, this table is under a
heavy load from all CRUD operations.Potential improvementsAs my experiment
showed, an additional index significantly helps reduce the query SCORE for
listEntities by using an Index Only Scan instead of a Bitmap Heap Scan,
which could potentially have a positive impact on reducing the load on the
DBMS during read operations.The experimentI conducted a simple experiment
and measured performance using EXPLAIN for the postgresql metastore; The
original
comment:https://github.com/apache/polaris/pull/2465#issuecomment-3244547998
<https://github.com/apache/polaris/pull/2465#issuecomment-3244547998>Test
ENVOS: `24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025;
root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64`Storage: Docker image
`postgres:17`Rows in total = 899040.Expected rows result =
200005.ResultsMain build versionEXPLAIN ANALYZE(SELECT id, catalog_id,
parent_id, type_code, name, entity_version, sub_type_code,
create_timestamp, drop_timestamp, purge_timestamp, to_purge_timestamp,
last_update_timestamp, properties, internal_properties,
grant_records_version, location_without_scheme FROM
POLARIS_SCHEMA.ENTITIES WHERE catalog_id = 6911759911282783572 AND
sub_type_code = 0 AND realm_id = 'POLARIS' AND parent_id =
6911759911282783572 AND type_code = 6);Bitmap Heap Scan on entities
(cost=3960.68..22260.05 rows=44861 width=140) (actual time=16.287..43.851
rows=200005 loops=1)  Recheck Cond: ((realm_id = 'POLARIS'::text) AND
(catalog_id = '6911759911282783572'::bigint) AND (parent_id =
'6911759911282783572'::bigint) AND (type_code = 6))  Filter: (sub_type_code
= 0)  Heap Blocks: exact=3848  ->  Bitmap Index Scan on constraint_name
(cost=0.00..3949.47 rows=44861 width=0) (actual time=15.792..15.792
rows=200005 loops=1)        Index Cond: ((realm_id = 'POLARIS'::text) AND
(catalog_id = '6911759911282783572'::bigint) AND (parent_id =
'6911759911282783572'::bigint) AND (type_code = 6))Planning Time: 0.239
msExecution Time: 49.253 msNew build versionCREATE INDEX IF NOT EXISTS
idx_entities_lookup    ON entities USING btree (realm_id, catalog_id,
parent_id, type_code, sub_type_code)    INCLUDE (id, name);EXPLAIN
ANALYZE(SELECT id, catalog_id, parent_id, type_code, name,
sub_type_code  FROM POLARIS_SCHEMA.ENTITIES WHERE catalog_id =
6911759911282783572 AND sub_type_code = 0 AND realm_id = 'POLARIS' AND
parent_id = 6911759911282783572 AND type_code = 6);Index Only Scan using
idx_entities_lookup on entities  (cost=0.42..2975.89 rows=42744 width=50)
(actual time=0.030..29.235 rows=200005 loops=1)  Index Cond: ((realm_id =
'POLARIS'::text) AND (catalog_id = '6911759911282783572'::bigint) AND
(parent_id = '6911759911282783572'::bigint) AND (type_code = 6) AND
(sub_type_code = 0))  Heap Fetches: 0Planning Time: 0.089 msExecution Time:
35.317 msConclusionUsing `Index Only Scan` instead of `Bitmap Heap Scan`
helps:- reduce query execution time up to 28%;- reducing query cost up to
87%;Potential risksI conducted my experiments on only 1 thread and only on
one read operation, and possible additional loads on re-indexing operations
during intensive write operations were not taken into account. Such cases
are possible because the Entities table is used for all objects and
roles.AlternativesThe alternative proposed by Dmitri Bourlatchkov
<[email protected]> is to delegate the decision to add the
index to the administrators of the Polaris deployment. This is because each
product is unique, and the resource maintainers should determine the
optimal configuration for their needs.Recommendations can be placed in the
metastore configuration
section:https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration
<https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration>*
Regards, Artur

Reply via email to