Sounds good! Let us know if you need any help on the doc/blog publication.

Yufei


On Wed, Sep 24, 2025 at 2:26 AM artur rakhmatulin <
[email protected]> wrote:

> Hi,
>
> Thanks for your interest in this topic and the valid points you've raised.
> I agree that understanding the write-side impact is important, especially
> given that the entity table holds most of the data in Polaris.
>
> Unfortunately, I don't have the time right now to conduct additional
> perf-tests on the write side. However, I'm open to further discussion if
> others are interested in this topic.
> Your suggestion about documenting the scenarios where the extra index
> provides benefits makes sense - it would help users make informed decisions
> based on their specific use cases.
>
> Best regards,
> Artur
>
>
> On 2025/09/06 00:08:09 Yufei Gu wrote:
> > Hi Artur, thanks for sharing the experiment results. Do we have any data
> on
> > the write side? The entity table holds almost everything in Polaris. We
> > cannot ignore the write side impact. Also if the read perf doesn't
> improve
> > a lot, we may leave the index as is. We can document how the extra index
> > helps in certain scenarios. So that users can make a decision per their
> use
> > cases. Either a blog or a new doc page works for me.
> >
> > Yufei
> >
> >
> > On Fri, Sep 5, 2025 at 9:54 AM artur rakhmatulin <
> > [email protected]> wrote:
> >
> > > Hi, I hope these clarifications regarding the context and experiment
> > > details provide a clearer understanding of my proposal.
> > >
> > > About idx_entities_lookup:
> > >
> > > It is proposed to add an index on a limited set of fields used in the
> > > listEntities query for JdbcBasePersistenceImpl
> > > ("id", "catalog_id", "parent_id", "type_code", "name", "sub_type_code")
> > > Changes using these fields instead of all fields used in the
> loadEntities
> > > method were delivered in my PR [1]
> > >
> > > The index is aimed at improving the performance of the query used in
> the
> > > listEntities method of the
> > > org.apache.polaris.persistence.relational.jdbc.JdbcBasePersistenceImpl
> > > class, which implements the BasePersistence, IntegrationPersistence
> > > interfaces.
> > > In turn, this method is used in the listEntities method of the
> > > org.apache.polaris.core.persistence.AtomicOperationMetaStoreManager
> class,
> > > which implements the BaseMetaStoreManager interface.
> > > The listEntities method, in both AtomicOperationMetaStoreManager and
> > > JdbcBasePersistenceImpl classes, is recommended for use when limited
> > > information about an Entity is required, while the loadEntities method
> is
> > > recommended for obtaining all information about an Entity, including
> > > properties and internal_properties.
> > > Due to its lightweight nature, the listEntities method is used much
> more
> > > widely, and improving its performance by adding an additional index
> will
> > > potentially bring much greater impact.
> > >
> > > About the previous experiment:
> > >
> > > The experiment cited in comment [2] refers to the time before the
> changes
> > > were adopted and should be read as “Main build version = [3]” and “New
> > > build version = any version after the changes were adopted in main.”
> > >
> > > New experiment:
> > >
> > > > psql -U root -d POLARIS -c "select count(1) from
> > > POLARIS_SCHEMA.ENTITIES;"
> > >   count
> > > ---------
> > >  1020040
> > >
> > > > psql -U root -d POLARIS -c "SELECT count(1) FROM
> POLARIS_SCHEMA.ENTITIES
> > > WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND
> realm_id =
> > > 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6"
> > >  count
> > > -------
> > >  20005
> > >
> > > Version c783de9315a8e55bc4dbda9a073dd0a64f9553c0
> > >
> > > [4]: > cat test2.sql
> > >
> > > pgbench -c 10 -j 2 -t 100 -f test1.sql -U root -d POLARIS
> > > pgbench (17.6 (Debian 17.6-1.pgdg13+1))
> > > starting vacuum...end.
> > > transaction type: test1.sql
> > > scaling factor: 1
> > > query mode: simple
> > > number of clients: 10
> > > number of threads: 2
> > > maximum number of tries: 1
> > > number of transactions per client: 100
> > > number of transactions actually processed: 1000/1000
> > > number of failed transactions: 0 (0.000%)
> > > latency average = 27.275 ms
> > > initial connection time = 6.839 ms
> > > tps = 366.633291 (without initial connection time)
> > >
> > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test1.sql)"
> > > QUERY PLAN
> > >
> > >
> -----------------------------------------------------------------------------------------------------------------------------
> > >  Index Scan using idx_entities on entities  (cost=0.42..6.20 rows=1
> > > width=320) (actual time=0.028..5.895 rows=20005 loops=1)
> > >    Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> > > '6911759911282783572'::bigint))
> > >    Filter: ((parent_id = '6911759911282783572'::bigint) AND
> (sub_type_code
> > > = 0) AND (type_code = 6))
> > >    Rows Removed by Filter: 14
> > >  Planning Time: 0.429 ms
> > >  Execution Time: 6.632 ms
> > >
> > >
> > > Version f6bcbd29e6ef33745d06d3e3bb70f4a7297a1224
> > >
> > > [5]: > cat test2.sql
> > >
> > > > pgbench -c 10 -j 2 -t 100 -f test2.sql -U root -d POLARIS
> > > pgbench (17.6 (Debian 17.6-1.pgdg13+1))
> > > starting vacuum...end.
> > > transaction type: test2.sql
> > > scaling factor: 1
> > > query mode: simple
> > > number of clients: 10
> > > number of threads: 2
> > > maximum number of tries: 1
> > > number of transactions per client: 100
> > > number of transactions actually processed: 1000/1000
> > > number of failed transactions: 0 (0.000%)
> > > latency average = 8.526 ms
> > > initial connection time = 6.840 ms
> > > tps = 1172.823796 (without initial connection time)
> > >
> > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test2.sql)"
> > > QUERY PLAN
> > >
> > >
> ----------------------------------------------------------------------------------------------------------------------------
> > >  Index Scan using idx_entities on entities  (cost=0.42..6.20 rows=1
> > > width=50) (actual time=0.019..4.903 rows=20005 loops=1)
> > >    Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> > > '6911759911282783572'::bigint))
> > >    Filter: ((parent_id = '6911759911282783572'::bigint) AND
> (sub_type_code
> > > = 0) AND (type_code = 6))
> > >    Rows Removed by Filter: 14
> > >  Planning Time: 0.377 ms
> > >  Execution Time: 5.477 ms
> > >
> > >
> > > > CREATE INDEX IF NOT EXISTS idx_entities_lookup ON entities USING
> btree
> > > (realm_id, catalog_id, parent_id, type_code, sub_type_code) INCLUDE
> (id,
> > > name);
> > >
> > > > pgbench -c 10 -j 2 -t 100 -f test2.sql -U root -d POLARIS
> > > pgbench (17.6 (Debian 17.6-1.pgdg13+1))
> > > starting vacuum...end.
> > > transaction type: test2.sql
> > > scaling factor: 1
> > > query mode: simple
> > > number of clients: 10
> > > number of threads: 2
> > > maximum number of tries: 1
> > > number of transactions per client: 100
> > > number of transactions actually processed: 1000/1000
> > > number of failed transactions: 0 (0.000%)
> > > latency average = 8.530 ms
> > > initial connection time = 8.146 ms
> > > tps = 1172.298584 (without initial connection time)
> > >
> > > > psql -U root -d POLARIS -c "EXPLAIN ANALYZE $(cat test2.sql)"
> > > QUERY PLAN
> > >
> > >
> > >
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >  Index Only Scan using idx_entities_lookup on entities
> (cost=0.42..6.20
> > > rows=1 width=50) (actual time=0.026..5.215 rows=20005 loops=1)
> > >    Index Cond: ((realm_id = 'POLARIS'::text) AND (catalog_id =
> > > '6911759911282783572'::bigint) AND (parent_id =
> > > '6911759911282783572'::bigint) AND (type_code = 6) AND (sub_type_code
> = 0))
> > >    Heap Fetches: 20001
> > >  Planning Time: 0.331 ms
> > >  Execution Time: 5.785 ms
> > >
> > > Conclusion:
> > > Current results show that reducing columns for the selection helps
> decrease
> > > query execution time, BUT adding a new index does not affect search
> speed.
> > > By default, the existing idx_entities index is used along with an
> > > additional FILTER, which provides better performance.
> > >
> > > Alternative for alternative:
> > >
> > > We can add the new idx_entities_lookup by default and also add a
> comment to
> > > the documentation, recommending Polaris service maintainers to delete
> or
> > > optimize it manually if they experience performance issues due to the
> extra
> > > index.
> > >
> > > [1]: https://github.com/apache/polaris/pull/2465
> > > [2]:
> https://github.com/apache/polaris/pull/2465#issuecomment-3244547998
> > > [3]:
> > >
> > >
> https://github.com/apache/polaris/releases/tag/apache-polaris-1.1.0-incubating-rc0
> > >
> > >
> > > [4]: > cat test1.sql
> > > SELECT id, catalog_id, parent_id, type_code, name, entity_version,
> > > sub_type_code, create_timestamp, drop_timestamp, purge_timestamp,
> > > to_purge_timestamp, last_update_timestamp, properties,
> internal_properties,
> > > grant_records_version, location_without_scheme
> > >  FROM POLARIS_SCHEMA.ENTITIES
> > >  WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND
> realm_id
> > > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6
> > >
> > > [5]: > cat test2.sql
> > > SELECT id, catalog_id, parent_id, type_code, name, sub_type_code
> > >  FROM POLARIS_SCHEMA.ENTITIES
> > >  WHERE catalog_id = 6911759911282783572 AND sub_type_code = 0 AND
> realm_id
> > > = 'POLARIS' AND parent_id = 6911759911282783572 AND type_code = 6
> > >
> > > Thanks.
> > >
> >
>

Reply via email to