Sounds good! Let us know if you need any help on the doc/blog publication. Yufei
On Wed, Sep 24, 2025 at 2:26 AM artur rakhmatulin < [email protected]> wrote: > Hi, > > Thanks for your interest in this topic and the valid points you've raised. > I agree that understanding the write-side impact is important, especially > given that the entity table holds most of the data in Polaris. > > Unfortunately, I don't have the time right now to conduct additional > perf-tests on the write side. However, I'm open to further discussion if > others are interested in this topic. > Your suggestion about documenting the scenarios where the extra index > provides benefits makes sense - it would help users make informed decisions > based on their specific use cases. > > Best regards, > Artur > > > On 2025/09/06 00:08:09 Yufei Gu wrote: > > Hi Artur, thanks for sharing the experiment results. Do we have any data > on > > the write side? The entity table holds almost everything in Polaris. We > > cannot ignore the write side impact. Also if the read perf doesn't > improve > > a lot, we may leave the index as is. We can document how the extra index > > helps in certain scenarios. So that users can make a decision per their > use > > cases. Either a blog or a new doc page works for me. > > > > Yufei > > > > > > On Fri, Sep 5, 2025 at 9:54 AM artur rakhmatulin < > > [email protected]> wrote: > > > > > Hi, I hope these clarifications regarding the context and experiment > > > details provide a clearer understanding of my proposal. > > > > > > About idx_entities_lookup: > > > > > > It is proposed to add an index on a limited set of fields used in the > > > listEntities query for JdbcBasePersistenceImpl > > > ("id", "catalog_id", "parent_id", "type_code", "name", "sub_type_code") > > > Changes using these fields instead of all fields used in the > loadEntities > > > method were delivered in my PR [1] > > > > > > The index is aimed at improving the performance of the query used in > the > > > listEntities method of the > > > org.apache.polaris.persistence.relational.jdbc.JdbcBasePersistenceImpl > > > class, which implements the BasePersistence, IntegrationPersistence > > > interfaces. > > > In turn, this method is used in the listEntities method of the > > > org.apache.polaris.core.persistence.AtomicOperationMetaStoreManager > class, > > > which implements the BaseMetaStoreManager interface. > > > The listEntities method, in both AtomicOperationMetaStoreManager and > > > JdbcBasePersistenceImpl classes, is recommended for use when limited > > > information about an Entity is required, while the loadEntities method > is > > > recommended for obtaining all information about an Entity, including > > > properties and internal_properties. > > > Due to its lightweight nature, the listEntities method is used much > more > > > widely, and improving its performance by adding an additional index > will > > > potentially bring much greater impact. > > > > > > About the previous experiment: > > > > > > The experiment cited in comment [2] refers to the time before the > changes > > > were adopted and should be read as “Main build version = [3]” and “New > > > build version = any version after the changes were adopted in main.” > > > > > > New experiment: > > > > > > > psql -U root -d POLARIS -c "select count(1) from > > > POLARIS_SCHEMA.ENTITIES;" > > > count > > > --------- > > > 1020040 > > > > > > > psql -U root -d POLARIS -c "SELECT count(1) FROM > POLARIS_SCHEMA.ENTITIES > > > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND > realm_id = > > > 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6" > > > count > > > ------- > > > 20005 > > > > > > Version c783de9315a8e55bc4dbda9a073dd0a64f9553c0 > > > > > > [4]: > cat test2.sql > > > > > > pgbench -c 10 -j 2 -t 100 -f test1.sql -U root -d POLARIS > > > pgbench (17.6 (Debian 17.6-1.pgdg13+1)) > > > starting vacuum...end. > > > transaction type: test1.sql > > > scaling factor: 1 > > > query mode: simple > > > number of clients: 10 > > > number of threads: 2 > > > maximum number of tries: 1 > > > number of transactions per client: 100 > > > number of transactions actually processed: 1000/1000 > > > number of failed transactions: 0 (0.000%) > > > latency average = 27.275 ms > > > initial connection time = 6.839 ms > > > tps = 366.633291 (without initial connection time) > > > > > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test1.sql)" > > > QUERY PLAN > > > > > > > ----------------------------------------------------------------------------------------------------------------------------- > > > Index Scan using idx_entities on entities (cost=0.42..6.20 rows=1 > > > width=320) (actual time=0.028..5.895 rows=20005 loops=1) > > > Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > > > '6911759911282783572'::bigint)) > > > Filter: ((parent_id = '6911759911282783572'::bigint) AND > (sub_type_code > > > = 0) AND (type_code = 6)) > > > Rows Removed by Filter: 14 > > > Planning Time: 0.429 ms > > > Execution Time: 6.632 ms > > > > > > > > > Version f6bcbd29e6ef33745d06d3e3bb70f4a7297a1224 > > > > > > [5]: > cat test2.sql > > > > > > > pgbench -c 10 -j 2 -t 100 -f test2.sql -U root -d POLARIS > > > pgbench (17.6 (Debian 17.6-1.pgdg13+1)) > > > starting vacuum...end. > > > transaction type: test2.sql > > > scaling factor: 1 > > > query mode: simple > > > number of clients: 10 > > > number of threads: 2 > > > maximum number of tries: 1 > > > number of transactions per client: 100 > > > number of transactions actually processed: 1000/1000 > > > number of failed transactions: 0 (0.000%) > > > latency average = 8.526 ms > > > initial connection time = 6.840 ms > > > tps = 1172.823796 (without initial connection time) > > > > > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test2.sql)" > > > QUERY PLAN > > > > > > > ---------------------------------------------------------------------------------------------------------------------------- > > > Index Scan using idx_entities on entities (cost=0.42..6.20 rows=1 > > > width=50) (actual time=0.019..4.903 rows=20005 loops=1) > > > Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > > > '6911759911282783572'::bigint)) > > > Filter: ((parent_id = '6911759911282783572'::bigint) AND > (sub_type_code > > > = 0) AND (type_code = 6)) > > > Rows Removed by Filter: 14 > > > Planning Time: 0.377 ms > > > Execution Time: 5.477 ms > > > > > > > > > > CREATE INDEX IF NOT EXISTS idx_entities_lookup ON entities USING > btree > > > (realm_id, catalog_id, parent_id, type_code, sub_type_code) INCLUDE > (id, > > > name); > > > > > > > pgbench -c 10 -j 2 -t 100 -f test2.sql -U root -d POLARIS > > > pgbench (17.6 (Debian 17.6-1.pgdg13+1)) > > > starting vacuum...end. > > > transaction type: test2.sql > > > scaling factor: 1 > > > query mode: simple > > > number of clients: 10 > > > number of threads: 2 > > > maximum number of tries: 1 > > > number of transactions per client: 100 > > > number of transactions actually processed: 1000/1000 > > > number of failed transactions: 0 (0.000%) > > > latency average = 8.530 ms > > > initial connection time = 8.146 ms > > > tps = 1172.298584 (without initial connection time) > > > > > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test2.sql)" > > > QUERY PLAN > > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > Index Only Scan using idx_entities_lookup on entities > (cost=0.42..6.20 > > > rows=1 width=50) (actual time=0.026..5.215 rows=20005 loops=1) > > > Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id = > > > '6911759911282783572'::bigint) AND (parent_id = > > > '6911759911282783572'::bigint) AND (type_code = 6) AND (sub_type_code > = 0)) > > > Heap Fetches: 20001 > > > Planning Time: 0.331 ms > > > Execution Time: 5.785 ms > > > > > > Conclusion: > > > Current results show that reducing columns for the selection helps > decrease > > > query execution time, BUT adding a new index does not affect search > speed. > > > By default, the existing idx_entities index is used along with an > > > additional FILTER, which provides better performance. > > > > > > Alternative for alternative: > > > > > > We can add the new idx_entities_lookup by default and also add a > comment to > > > the documentation, recommending Polaris service maintainers to delete > or > > > optimize it manually if they experience performance issues due to the > extra > > > index. > > > > > > [1]: https://github.com/apache/polaris/pull/2465 > > > [2]: > https://github.com/apache/polaris/pull/2465#issuecomment-3244547998 > > > [3]: > > > > > > > https://github.com/apache/polaris/releases/tag/apache-polaris-1.1.0-incubating-rc0 > > > > > > > > > [4]: > cat test1.sql > > > SELECT id, catalog_id, parent_id, type_code, name, entity_version, > > > sub_type_code, create_timestamp, drop_timestamp, purge_timestamp, > > > to_purge_timestamp, last_update_timestamp, properties, > internal_properties, > > > grant_records_version, location_without_scheme > > > FROM POLARIS_SCHEMA.ENTITIES > > > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND > realm_id > > > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6 > > > > > > [5]: > cat test2.sql > > > SELECT id, catalog_id, parent_id, type_code, name, sub_type_code > > > FROM POLARIS_SCHEMA.ENTITIES > > > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND > realm_id > > > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6 > > > > > > Thanks. > > > > > >
