On Mon, Apr 21, 2025 at 6:34 PM Yufei Gu <flyrain...@gmail.com> wrote:
> Thanks Pierre for driving this. The plan sounds good to me! A side > question, are we planning to make a benchmark pipeline against the main > branch? I think it would be good to setup a pipeline against `main`, yes. Although I am not sure which hardware could be used to support that. In the `Polaris benchmarks proposal` thread, Robert and JB mentioned the possibility of having sponsors provide Apache projects with hardware. That could be an area worth exploring. The inevitable questions that will come up is related to benchmark cost. Like "how much time and how many instances would be needed?". Which is directly linked to the amount of data and queries we want injected by Gatling. In your experience, in terms of number of catalogs, namespaces, tables and views, what would constitute a real-world Polaris deployment? What about number of read and write queries per second? > The only backend option now is the EclipseLink, we will have the > JDBC backend soon. Running the benchmarks against the EclipseLink implementation is doable. But keep in mind that, as per https://github.com/apache/polaris/issues/1123, the EclipseLink implementation can only serve one query at a time. Assuming the response times I have seen when I benchmarked EclipseLink and MongoDB persistence layers are still accurate, it would take close to 2h just to create 100k tables. My understanding is that #1123 will not be fixed. Instead, the JDBC persistence will be used and gradually replace EclipseLink. Which then suggests that running regular tests against EclipseLink might not be the best use of time and resources. As for JDBC and NoSQL, those are cases we should definitely test regularly. -- Pierre