Thank you so much for the benchmarks !
+1, having benchmark results committed, it will help catch any degradation
/ correctness issue that can creep in !
equivalent to golden files of tpc-ds / tpc-h in spark repo.

Best,
Prashant Sungh

On Wed, Mar 19, 2025 at 8:53 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think having a tool like this is a great idea. Would we be able to host
> the results over time as well? Like an official build run that triggers on
> a daily basis?
>
> On Wed, Mar 19, 2025 at 10:07 AM Pierre Laporte <pie...@pingtimeout.fr>
> wrote:
>
> > Hi
> >
> > I have been working on a set of benchmarks for Polaris [1] and would like
> > to contribute them to the project.  I have opened a PR with the code, in
> > case anybody is interested.
> >
> > The benchmarks are written using Gatling.  The core design decision
> > consists in building a procedural dataset, loading it to Polaris, and
> then
> > reusing it for all subsequent benchmarks.  The procedural aspect makes it
> > possible to deterministically regenerate the same dataset at runtime over
> > and over, without having to store the actual data.
> >
> > With this, it is trivial to generate large number of Polaris entities.
> > Typically, I used this to benchmark the NoSQL persistence implementation
> > with 65k namespaces, 65k tables and 65k views.  Increasing that to
> millions
> > would only require a one parameter change.  Additionally, the dataset
> > currently includes property updates for namespaces, tables and views,
> which
> > can quickly create hundreds of manifests.  This can be useful for table
> > maintenance testing.
> >
> > Three benchmarks have been created so far:
> >
> >    - A benchmark that populates an empty Polaris server with a dataset
> that
> >    have predefined attributes
> >    - A benchmark that issues only read queries over that dataset
> >    - A benchmark that issues read and write queries (entity updates) over
> >    that dataset, with a configurable read/write ratio
> >
> > The benchmarks/README.md contains instructions to build and run the
> > benchmarks, as well as to describe the kind of dataset that should be
> > generated.
> >
> > As with every Gatling benchmark, an HTML page is generated with
> interactive
> > charts showing query performance over time, response time percentiles,
> > etc...
> >
> > I would love to head your feedback on it.
> >
> > Pierre
> >
> > [1] https://github.com/apache/polaris/pull/1208
> > --
> >
> > Pierre Laporte
> > @pingtimeout <https://twitter.com/pingtimeout>
> > pie...@pingtimeout.fr
> > http://www.pingtimeout.fr/
> >
>

Reply via email to