For posterity, https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0#bookmark=id.cmp67w7aja0o explains a bit about why the Postgres implementation hits this. To recap from the doc comment:
1. The current Postgres impl naively follows the same design pattern as FDB, meaning "wrap everything in a big transaction". In this model, the fact that the tables are in different "namespaces" is obscured by the fact that the "entitesActive" table will experience direct contention across all the things happening in the same catalog (and possibly even across catalogs?) 2. The current multi-table transaction code has some TODOs to actually do retries In this world, it's working as intended that the transaction would fail because it "thinks" there are conflicts despite us logically knowing that there aren't conflicts, and then without catalog-layer retries, the db error propagates out -- the expected client-side handling of doing a full client-side retry should still work though. In this section, the proposed update to the EclipseLink implementation should remove the big transaction, and then the single-row atomic updates should no longer step on each others' toes for different tables: https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0#bookmark=id.cmp67w7aja0o On Thu, Jan 9, 2025 at 8:32 AM Dmitri Bourlatchkov <di...@apache.org> wrote: > Hi All, > > > I have been experimenting with running Polaris on PostgreSQL where multiple > servers access the same database. > > I present two specific test cases in the appendixes below for review. The > test cases are possible in real life even though they may look a bit > contrived in the current state of the code. > > The errors that come out of running those test requests are valid in > themselves, however, I’d like this discussion to avoid diving right into > fixing the specific errors, but concentrate on the conceptual correctness > aspects of Polaris interactions with the database. > > My personal interpretation of the test outcomes (having current Polaris > code in mind) is that interactions with the database are not well-defined > in terms of expected commit behaviours and transaction isolation levels. > > Note: the test case using Serializable isolation is largely inspired by > code comments in TransactionWorkspaceMetaStoreManager. > > Given interest in NoSQL backends expressed in previous discussions, I’d > like this discussion to focus on how Polaris persistence-related interfaces > can be defined so that clear expectations are stated on database behaviours > so as to enable reasoning about correctness of alternative persistence > implementations on a conceptual level (i.e. without relying on tests to > prove correctness (while using TDD to validate code, of course)). > > Please share your thoughts on this matter. > > From my personal point of view, I’d like to refactor what > TransactionWorkspaceMetaStoreManager does into a first class Polaris > service object (or interface) as opposed to being an ad-hoc runtime hook > for some special cases. The end goal being a REST services refactoring > where changes are accumulated in-memory and then sent to the > database-specific persistence implementation to be committed as a whole. > The expectations on the database would be that the reads and writes made in > the context of a “session” when it is committed must be non-conflicting > with concurrent reads and writes made by other sessions. The important > distinction with respect to EclipseLink (or any other JPA implementation) > is that reads and writes in the proposed approach are reads and writes made > by Polaris Services, not by the JPA middleware. This way consistency > concerns are made explicit in Polaris code. The “session” model should also > be flexible enough to allow different implementations to define session > boundaries and resolve read/write conflicts according to the backend > capabilities for a wide range of database systems. > > Thanks, > > Dmitri. > > ======== Appendix A: Unrelated Changes with Serializable Tx Isolation > ======= > > Test code (old, raw, but functional): > > https://github.com/dimas-b/polaris/commit/88960348906c33452cc4a81cf39690272bbb191c > > Test case: > > - > > Run PostgreSQL with the “default-realm” database configuration with > “serializable” Tx isolation mode by default > - > > alter database "default-realm" set default_transaction_isolation = > serializable; > - > > Bootstrap Polaris (with EclipseLink) > - > > Create Polaris catalog (e.g. “polaris2”) with FILE storage > - > > ./polaris --client-id *** --client-secret *** catalogs create > polaris2 --storage-type FILE --default-base-location file:///tmp/pol2 > - > > Create test tables in two separate namespaces > - > > java -jar > test-client/build/libs/test-client-1.0.0-incubating-SNAPSHOT.jar > http://localhost:8181/api/catalog 5 <NAMESPACE> create > - > > Run two Polaris servers connected to the same PostgreSQL database, but > - > > Run two test clients, each connected to its own Polaris server and using > its own namespace > - > > java -jar > test-client/build/libs/test-client-1.0.0-incubating-SNAPSHOT.jar > http://localhost:8181/api/catalog 5 ns4 update w_0_ > - > > java -jar > test-client/build/libs/test-client-1.0.0-incubating-SNAPSHOT.jar > http://localhost:8191/api/catalog 5 ns5 update w_1_ > > > The clients modify 5 tables together in the same Iceberg commit, but in > different (!) namespaces. > > After a (short) while, one of the clients will get this error: > > Caused by: org.apache.iceberg.exceptions.ServiceFailureException: Service > failed: 500: Exception [EclipseLink-4002] (Eclipse Persistence Services - > 4.0.4.v202407190748-059428cdd2583c46f1f3e50d235854840a6fa9a7): > org.eclipse.persistence.exceptions.DatabaseException > > Internal Exception: org.postgresql.util.PSQLException: ERROR: could not > serialize access due to read/write dependencies among transactions > > Detail: Reason code: Canceled on identification as a pivot, during > conflict out checking. > > Hint: The transaction might succeed if retried. > > Error Code: 0 > > Call: SELECT CATALOGID, ID, PARENTID, TYPECODE, NAME, SUBTYPECODE FROM > ENTITIES_ACTIVE WHERE ((((CATALOGID = ?) AND (PARENTID = ?)) AND (TYPECODE > = > ?)) AND (NAME = ?)) > > bind => [4 parameters bound] > > Query: ReadAllQuery(referenceClass=ModelEntityActive sql="SELECT CATALOGID, > ID, PARENTID, TYPECODE, NAME, SUBTYPECODE FROM ENTITIES_ACTIVE WHERE > ((((CATALOGID = ?) AND (PARENTID = ?)) AND (TYPECODE = ?)) AND (NAME = > ?))") > > ... 11 more > > (I can provide the full exception stack trace if somebody is interested). > > ======== Appendix B: Concurrent Changes with Read Committed Isolation Level > ======= > > Similar setup as above, but with client side retries and “read committed” > isolation level (default in PostgreSQL). > > Client code: > > https://github.com/dimas-b/polaris/commit/070878fd2f6bb2731780649e0bc980865bf57317 > > Run both clients on the same namespace, but modifying different schema > columns. > > - > > java -jar > test-client/build/libs/test-client-1.0.0-incubating-SNAPSHOT.jar > http://localhost:8181/api/catalog 5 ns1 update w_0_ > - > > java -jar > test-client/build/libs/test-client-1.0.0-incubating-SNAPSHOT.jar > http://localhost:8191/api/catalog 5 ns1 update w_1_ > > > This failure is a bit trickier to reproduce and may require a few > client-side restarts, but eventually one of the clients will get this > error: > > Caused by: org.apache.iceberg.exceptions.ServiceFailureException: Service > failed: 500: jakarta.persistence.OptimisticLockException: Exception > [EclipseLink-5006] (Eclipse Persistence Services - > 4.0.4.v202407190748-059428cdd2583c46f1f3e50d235854840a6fa9a7): > org.eclipse.persistence.exceptions.OptimisticLockException > > Exception Description: The object > [org.apache.polaris.core.persistence.models.ModelEntity@62e2ea2b] cannot > be > updated because it has changed or been deleted since it was last read. > > Class> org.apache.polaris.core.persistence.models.ModelEntity Primary Key> > [[51, 54]: 2596] > > ... 11 more >