Polaris benchmarks proposal

Pierre Laporte Wed, 19 Mar 2025 08:08:14 -0700

Hi

I have been working on a set of benchmarks for Polaris [1] and would like
to contribute them to the project.  I have opened a PR with the code, in
case anybody is interested.


The benchmarks are written using Gatling.  The core design decision
consists in building a procedural dataset, loading it to Polaris, and then
reusing it for all subsequent benchmarks.  The procedural aspect makes it
possible to deterministically regenerate the same dataset at runtime over
and over, without having to store the actual data.

With this, it is trivial to generate large number of Polaris entities.
Typically, I used this to benchmark the NoSQL persistence implementation
with 65k namespaces, 65k tables and 65k views.  Increasing that to millions
would only require a one parameter change.  Additionally, the dataset
currently includes property updates for namespaces, tables and views, which
can quickly create hundreds of manifests.  This can be useful for table
maintenance testing.

Three benchmarks have been created so far:

   - A benchmark that populates an empty Polaris server with a dataset that
   have predefined attributes
   - A benchmark that issues only read queries over that dataset
   - A benchmark that issues read and write queries (entity updates) over
   that dataset, with a configurable read/write ratio

The benchmarks/README.md contains instructions to build and run the
benchmarks, as well as to describe the kind of dataset that should be
generated.

As with every Gatling benchmark, an HTML page is generated with interactive
charts showing query performance over time, response time percentiles,
etc...

I would love to head your feedback on it.

Pierre

[1] https://github.com/apache/polaris/pull/1208
--

Pierre Laporte
@pingtimeout <https://twitter.com/pingtimeout>
pie...@pingtimeout.fr
http://www.pingtimeout.fr/

Polaris benchmarks proposal

Reply via email to