Thank you so much for the benchmarks ! +1, having benchmark results committed, it will help catch any degradation / correctness issue that can creep in ! equivalent to golden files of tpc-ds / tpc-h in spark repo.
Best, Prashant Sungh On Wed, Mar 19, 2025 at 8:53 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > I think having a tool like this is a great idea. Would we be able to host > the results over time as well? Like an official build run that triggers on > a daily basis? > > On Wed, Mar 19, 2025 at 10:07 AM Pierre Laporte <pie...@pingtimeout.fr> > wrote: > > > Hi > > > > I have been working on a set of benchmarks for Polaris [1] and would like > > to contribute them to the project. I have opened a PR with the code, in > > case anybody is interested. > > > > The benchmarks are written using Gatling. The core design decision > > consists in building a procedural dataset, loading it to Polaris, and > then > > reusing it for all subsequent benchmarks. The procedural aspect makes it > > possible to deterministically regenerate the same dataset at runtime over > > and over, without having to store the actual data. > > > > With this, it is trivial to generate large number of Polaris entities. > > Typically, I used this to benchmark the NoSQL persistence implementation > > with 65k namespaces, 65k tables and 65k views. Increasing that to > millions > > would only require a one parameter change. Additionally, the dataset > > currently includes property updates for namespaces, tables and views, > which > > can quickly create hundreds of manifests. This can be useful for table > > maintenance testing. > > > > Three benchmarks have been created so far: > > > > - A benchmark that populates an empty Polaris server with a dataset > that > > have predefined attributes > > - A benchmark that issues only read queries over that dataset > > - A benchmark that issues read and write queries (entity updates) over > > that dataset, with a configurable read/write ratio > > > > The benchmarks/README.md contains instructions to build and run the > > benchmarks, as well as to describe the kind of dataset that should be > > generated. > > > > As with every Gatling benchmark, an HTML page is generated with > interactive > > charts showing query performance over time, response time percentiles, > > etc... > > > > I would love to head your feedback on it. > > > > Pierre > > > > [1] https://github.com/apache/polaris/pull/1208 > > -- > > > > Pierre Laporte > > @pingtimeout <https://twitter.com/pingtimeout> > > pie...@pingtimeout.fr > > http://www.pingtimeout.fr/ > > >