Ah it's much better ;)

Thanks :)

Regards
JB

On Thu, Sep 4, 2025 at 4:55 PM artur rakhmatulin
<[email protected]> wrote:
>
> My apology.
>
> Proposed changes:
> Add an additional index idx_entities_lookup to the Entities metastore table.
>
> For Postgresql metastore:
> CREATE INDEX IF NOT EXISTS idx_entities_lookup
>     ON entities USING btree (realm_id, catalog_id, parent_id, type_code,
> sub_type_code)
>     INCLUDE (id, name);
> For H2 metastore:
> CREATE INDEX IF NOT EXISTS idx_entities_lookup
>     ON entities (realm_id, catalog_id, parent_id, type_code, sub_type_code,
> id, name);
>
> Background:
> Original request [1]
> Previous PR with changes [2]
>
> Existing indexes
> * Idx_entities (realm_id, catalog_id, id)
> * Idx_locations (realm_id, parent_id, location_without_scheme)
>
> Where Entities is used:
> Entities is the main table for storing all primary objects.
>
> Descendants of org.apache.polaris.core.entity.PolarisEntityCore:
> * CatalogEntity (org.apache.polaris.core.entity)
> * CatalogRoleEntity (org.apache.polaris.core.entity)
> * GenericTableEntity (org.apache.polaris.core.entity.table)
> * IcebergTableLikeEntity (org.apache.polaris.core.entity.table)
> * NamespaceEntity (org.apache.polaris.core.entity)
> * PolarisBaseEntity (org.apache.polaris.core.entity)
> * PolarisEntity (org.apache.polaris.core.entity)
> * PolicyEntity (org.apache.polaris.core.policy)
> * PrincipalEntity (org.apache.polaris.core.entity)
> * PrincipalRoleEntity (org.apache.polaris.core.entity)
> * TableLikeEntity (org.apache.polaris.core.entity.table)
> * TaskEntity (org.apache.polaris.core.entity)
>
> And potentially, this table is under a heavy load from all CRUD operations.
>
> Potential improvements:
> As my experiment showed, an additional index significantly helps reduce the
> query SCORE for listEntities[a] by using an Index Only Scan instead of a
> Bitmap Heap Scan, which could potentially have a positive impact on
> reducing the load on the DBMS during read operations.
>
> The experiment:
> I conducted a simple experiment and measured performance using EXPLAIN for
> the postgresql metastore;
> The original comment: [3]
>
> Potential risks:
> I conducted my experiments on only 1 thread and only on one read operation,
> and possible additional loads on re-indexing operations during intensive
> write operations were not taken into account. Such cases are possible
> because the Entities table is used for all objects and roles.
>
> Alternatives:
> The alternative proposed by Dmitri Bourlatchkov is to delegate the decision
> to add the index to the administrators of the Polaris deployment. This is
> because each product is unique, and the resource maintainers should
> determine the optimal configuration for their needs.
>
> Recommendations can be placed in the metastore configuration section: [4]
>
> [1]: https://github.com/apache/polaris/issues/2352
> [2]: https://github.com/apache/polaris/pull/2465
> [3]: https://github.com/apache/polaris/pull/2465#issuecomment-3244547998
> [4]:
> https://polaris.apache.org/in-dev/unreleased/configuring-polaris-for-production/#metastore-configuration
>
> Regards, Artur
>
> The experiment details:
> ----
> Test ENV
> OS: `24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025;
> root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64`
> Storage: Docker image `postgres:17`
> Rows in total = 899040.
> Expected rows result = 200005.
>
> Results
> Main build version
> EXPLAIN ANALYZE
> (SELECT id, catalog_id, parent_id, type_code, name, entity_version,
> sub_type_code, create_timestamp, drop_timestamp, purge_timestamp,
> to_purge_timestamp, last_update_timestamp, properties, internal_properties,
> grant_records_version, location_without_scheme
>  FROM POLARIS_SCHEMA.ENTITIES
>  WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND realm_id
> = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6);
>
> Bitmap Heap Scan on entities  (cost=3960.68..22260.05 rows=44861 width=140)
> (actual time=16.287..43.851 rows=200005 loops=1)
>   Recheck Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> '6911759911282783572'::bigint) AND (parent_id =
> '6911759911282783572'::bigint) AND (type_code = 6))
>   Filter: (sub_type_code = 0)
>   Heap Blocks: exact=3848
>   ->  Bitmap Index Scan on constraint_name  (cost=0.00..3949.47 rows=44861
> width=0) (actual time=15.792..15.792 rows=200005 loops=1)
>         Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> '6911759911282783572'::bigint) AND (parent_id =
> '6911759911282783572'::bigint) AND (type_code = 6))
> Planning Time: 0.239 ms
> Execution Time: 49.253 ms
>
> New build version
> CREATE INDEX IF NOT EXISTS idx_entities_lookup
>     ON entities USING btree (realm_id, catalog_id, parent_id, type_code,
> sub_type_code)
>     INCLUDE (id, name);
>
> EXPLAIN ANALYZE
> (SELECT id, catalog_id, parent_id, type_code, name, sub_type_code
>  FROM POLARIS_SCHEMA.ENTITIES
>  WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND realm_id
> = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6);
>
> Index Only Scan using idx_entities_lookup on entities  (cost=0.42..2975.89
> rows=42744 width=50) (actual time=0.030..29.235 rows=200005 loops=1)
>   Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> '6911759911282783572'::bigint) AND (parent_id =
> '6911759911282783572'::bigint) AND (type_code = 6) AND (sub_type_code = 0))
>   Heap Fetches: 0
> Planning Time: 0.089 ms
> Execution Time: 35.317 ms
>
> Conclusion
> Using `Index Only Scan` instead of `Bitmap Heap Scan` helps:
> - reduce query execution time up to 28%;
> - reducing query cost up to 87%;
> ----

Reply via email to