Re: Next steps for Polaris benchmarks

Pierre Laporte Tue, 22 Apr 2025 08:06:28 -0700

On Mon, Apr 21, 2025 at 6:34 PM Yufei Gu <flyrain...@gmail.com> wrote:


> Thanks Pierre for driving this. The plan sounds good to me! A side
> question, are we planning to make a benchmark pipeline against the main
> branch?


I think it would be good to setup a pipeline against `main`, yes.  Although
I am not sure which hardware could be used to support that.  In the
`Polaris benchmarks proposal` thread, Robert and JB mentioned the
possibility of having sponsors provide Apache projects with hardware.  That
could be an area worth exploring.  The inevitable questions that will come
up is related to benchmark cost.  Like "how much time and how many
instances would be needed?".  Which is directly linked to the amount of
data and queries we want injected by Gatling.

In your experience, in terms of number of catalogs, namespaces, tables and
views, what would constitute a real-world Polaris deployment?  What about
number of read and write queries per second?


> The only backend option now is the EclipseLink, we will have the
> JDBC backend soon.


Running the benchmarks against the EclipseLink implementation is doable.
But keep in mind that, as per https://github.com/apache/polaris/issues/1123,
the EclipseLink implementation can only serve one query at a time.
Assuming the response times I have seen when I benchmarked EclipseLink and
MongoDB persistence layers are still accurate, it would take close to 2h
just to create 100k tables.

My understanding is that #1123 will not be fixed.  Instead, the JDBC
persistence will be used and gradually replace EclipseLink.  Which then
suggests that running regular tests against EclipseLink might not be the
best use of time and resources.

As for JDBC and NoSQL, those are cases we should definitely test regularly.

--

Pierre

Re: Next steps for Polaris benchmarks

Reply via email to