Hi I have been working on a set of benchmarks for Polaris [1] and would like to contribute them to the project. I have opened a PR with the code, in case anybody is interested.
The benchmarks are written using Gatling. The core design decision consists in building a procedural dataset, loading it to Polaris, and then reusing it for all subsequent benchmarks. The procedural aspect makes it possible to deterministically regenerate the same dataset at runtime over and over, without having to store the actual data. With this, it is trivial to generate large number of Polaris entities. Typically, I used this to benchmark the NoSQL persistence implementation with 65k namespaces, 65k tables and 65k views. Increasing that to millions would only require a one parameter change. Additionally, the dataset currently includes property updates for namespaces, tables and views, which can quickly create hundreds of manifests. This can be useful for table maintenance testing. Three benchmarks have been created so far: - A benchmark that populates an empty Polaris server with a dataset that have predefined attributes - A benchmark that issues only read queries over that dataset - A benchmark that issues read and write queries (entity updates) over that dataset, with a configurable read/write ratio The benchmarks/README.md contains instructions to build and run the benchmarks, as well as to describe the kind of dataset that should be generated. As with every Gatling benchmark, an HTML page is generated with interactive charts showing query performance over time, response time percentiles, etc... I would love to head your feedback on it. Pierre [1] https://github.com/apache/polaris/pull/1208 -- Pierre Laporte @pingtimeout <https://twitter.com/pingtimeout> pie...@pingtimeout.fr http://www.pingtimeout.fr/