I'm starting to open some small, isolated PRs.

The first ones are https://github.com/apache/polaris/pull/1229 and https://github.com/apache/polaris/pull/1230 - those don't change anything to the existing code base, but help reducing the size of the big https://github.com/apache/polaris/pull/1189.

On 17.03.25 15:45, Jean-Baptiste Onofré wrote:
Hi Robert,

Thanks for the update and the draft PR !

I would like to use this thread to thank Dennis. Big kudos to Dennis
for the changes he made: without these changes, it would have been
impossible to add new backends like MongoDB.

I propose we review and comment on Robert's PR.

I would also like to propose a community meeting to discuss the
Persistence Improvement and drive consensus.
What about Tuesday, March 25th at 9:30am PST ?

Thanks all !

Regards
JB

On Mon, Mar 17, 2025 at 2:43 PM Robert Stupp <sn...@snazy.de> wrote:
Hi,

I’ve made quite some progress on building the integration for NoSQL
databases. The initial code supports MongoDB [A], but is not limited to
that database. A working implementation has been pushed as a draft-PR
[1] for illustration purposes how it can look like when it is fully
integrated. A couple of smaller PRs will follow.

Background: The only common denominator for "synchronization purposes”
that all NoSQL databases support is a single-row compare-and-swap (CAS)
operation - think of this as (pseudo-SQL) “UPDATE table SET x =
:new_value WHERE primary_key = :primary_key AND x = :expected_old_value”.

The most important objective for the implementation is correctness,
especially in scenarios with high concurrent load. Explicit tests to
verify the correctness are included, for the CI “use case” and for
manual/special runs against a clustered database setup (which are just
“too much” for the Github hosted runners).

The current integration point is
‘MetaStoreManagerFactory’/’PolarisMetaStoreManager’ implemented in the
“bridge” Gradle project.

The ‘components/persistence/README.md’ in the draft-PR contains more
technical information.

A benchmarking tool to measure performance and correctness of Polaris
will be proposed soon as a separate/independent effort. We have used
this benchmarking tool to measure performance and implicitly the
correctness of the implementation.

Implementations for particular (No)SQL databases are isolated in one
(Gradle) project per database. This is effectively/conceptually the same
approach that already works for Nessie, which supports quite some
databases [2].

Robert

[1] https://github.com/apache/polaris/pull/1189
[2]
https://projectnessie.org/nessie-latest/configuration/#support-for-the-database-specific-implementations
[A] Technically there is also an “in memory” implementation for testing
purposes (not intended to replace the existing one).


--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Reply via email to