Hi Artur,

I don't know what email client you are using, but your message is not
well rendered on the mailing list (almost impossible to read :)).

Regards
JB

On Thu, Sep 4, 2025 at 4:35 PM artur rakhmatulin
<[email protected]> wrote:
>
> *Proposed changesAdd an additional index idx_entities_lookup to the
> Entities metastore table.For Postgresql metastore:CREATE INDEX IF NOT
> EXISTS idx_entities_lookup    ON entities USING btree (realm_id,
> catalog_id, parent_id, type_code, sub_type_code)    INCLUDE (id, name);For
> H2 metastore:CREATE INDEX IF NOT EXISTS idx_entities_lookup    ON entities
> (realm_id, catalog_id, parent_id, type_code, sub_type_code, id,
> name);BackgroundOriginal request -
> https://github.com/apache/polaris/issues/2352
> <https://github.com/apache/polaris/issues/2352>Previous PR with changes -
> https://github.com/apache/polaris/pull/2465
> <https://github.com/apache/polaris/pull/2465>Existing indexes -
> Idx_entities (realm_id, catalog_id, id)- Idx_locations (realm_id,
> parent_id, location_without_scheme)Where Entities is usedEntities is the
> main table for storing all primary objects.Descendants of
> org.apache.polaris.core.entity.PolarisEntityCore: - CatalogEntity
> (org.apache.polaris.core.entity)- CatalogRoleEntity
> (org.apache.polaris.core.entity)- GenericTableEntity
> (org.apache.polaris.core.entity.table)- IcebergTableLikeEntity
> (org.apache.polaris.core.entity.table)- NamespaceEntity
> (org.apache.polaris.core.entity)- PolarisBaseEntity
> (org.apache.polaris.core.entity)- PolarisEntity
> (org.apache.polaris.core.entity)- PolicyEntity
> (org.apache.polaris.core.policy)- PrincipalEntity
> (org.apache.polaris.core.entity)- PrincipalRoleEntity
> (org.apache.polaris.core.entity)- TableLikeEntity
> (org.apache.polaris.core.entity.table)- TaskEntity
> (org.apache.polaris.core.entity)And potentially, this table is under a
> heavy load from all CRUD operations.Potential improvementsAs my experiment
> showed, an additional index significantly helps reduce the query SCORE for
> listEntities by using an Index Only Scan instead of a Bitmap Heap Scan,
> which could potentially have a positive impact on reducing the load on the
> DBMS during read operations.The experimentI conducted a simple experiment
> and measured performance using EXPLAIN for the postgresql metastore; The
> original
> comment:https://github.com/apache/polaris/pull/2465#issuecomment-3244547998
> <https://github.com/apache/polaris/pull/2465#issuecomment-3244547998>Test
> ENVOS: `24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025;
> root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64`Storage: Docker image
> `postgres:17`Rows in total = 899040.Expected rows result =
> 200005.ResultsMain build versionEXPLAIN ANALYZE(SELECT id, catalog_id,
> parent_id, type_code, name, entity_version, sub_type_code,
> create_timestamp, drop_timestamp, purge_timestamp, to_purge_timestamp,
> last_update_timestamp, properties, internal_properties,
> grant_records_version, location_without_scheme FROM
> POLARIS_SCHEMA.ENTITIES WHERE catalog_id = 6911759911282783572 AND
> sub_type_code = 0 AND realm_id = 'POLARIS' AND parent_id =
> 6911759911282783572 AND type_code = 6);Bitmap Heap Scan on entities
> (cost=3960.68..22260.05 rows=44861 width=140) (actual time=16.287..43.851
> rows=200005 loops=1)  Recheck Cond: ((realm_id = 'POLARIS'::text) AND
> (catalog_id = '6911759911282783572'::bigint) AND (parent_id =
> '6911759911282783572'::bigint) AND (type_code = 6))  Filter: (sub_type_code
> = 0)  Heap Blocks: exact=3848  ->  Bitmap Index Scan on constraint_name
> (cost=0.00..3949.47 rows=44861 width=0) (actual time=15.792..15.792
> rows=200005 loops=1)        Index Cond: ((realm_id = 'POLARIS'::text) AND
> (catalog_id = '6911759911282783572'::bigint) AND (parent_id =
> '6911759911282783572'::bigint) AND (type_code = 6))Planning Time: 0.239
> msExecution Time: 49.253 msNew build versionCREATE INDEX IF NOT EXISTS
> idx_entities_lookup    ON entities USING btree (realm_id, catalog_id,
> parent_id, type_code, sub_type_code)    INCLUDE (id, name);EXPLAIN
> ANALYZE(SELECT id, catalog_id, parent_id, type_code, name,
> sub_type_code  FROM POLARIS_SCHEMA.ENTITIES WHERE catalog_id =
> 6911759911282783572 AND sub_type_code = 0 AND realm_id = 'POLARIS' AND
> parent_id = 6911759911282783572 AND type_code = 6);Index Only Scan using
> idx_entities_lookup on entities  (cost=0.42..2975.89 rows=42744 width=50)
> (actual time=0.030..29.235 rows=200005 loops=1)  Index Cond: ((realm_id =
> 'POLARIS'::text) AND (catalog_id = '6911759911282783572'::bigint) AND
> (parent_id = '6911759911282783572'::bigint) AND (type_code = 6) AND
> (sub_type_code = 0))  Heap Fetches: 0Planning Time: 0.089 msExecution Time:
> 35.317 msConclusionUsing `Index Only Scan` instead of `Bitmap Heap Scan`
> helps:- reduce query execution time up to 28%;- reducing query cost up to
> 87%;Potential risksI conducted my experiments on only 1 thread and only on
> one read operation, and possible additional loads on re-indexing operations
> during intensive write operations were not taken into account. Such cases
> are possible because the Entities table is used for all objects and
> roles.AlternativesThe alternative proposed by Dmitri Bourlatchkov
> <[email protected]> is to delegate the decision to add the
> index to the administrators of the Polaris deployment. This is because each
> product is unique, and the resource maintainers should determine the
> optimal configuration for their needs.Recommendations can be placed in the
> metastore configuration
> section:https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration
> <https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration>*
> Regards, Artur

Reply via email to