Hi all,

Today, the Iceberg spec has table properties defining the transaction isolation 
levels: write.delete/update/merge.isolation-level. These properties can be set 
to either `snapshot` or `serializable`. With a properly designed writer and 
Iceberg multi version snapshots, we can achieve single table snapshot isolation 
or even serializable isolation. 

But for queries involving multiple tables, the spec does not provide a 
mechanism to achieve a global snapshot consistency. The Iceberg REST Catalog 
(IRC) API provides only single-table load operation: LoadTable, and clients 
would need to call this API multiple times to resolve table metadata in a 
single query statement - each could represent a different snapshot view of the 
catalog.

This creates problem especially for engines that already support global SI. For 
example, the transaction semantics for AWS Redshift when query its native 
tables is different than querying against Iceberg tables, which surprises 
customers at times.

There were proposals in the past in the context of multi-statement transaction 
discussion 
(https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit#heading=h.qb9z621zr507).
 But I feel these proposals are too complicated and require significant changes 
to the catalog/IRC protocol.

Here I propose a simpler approach: add a batch LoadTables API, and rely on the 
catalog's underlying system-of-record to provide snapshot isolation for that 
batch read. 

When a client calls LoadTables({table_a, table_b, table_c}), the catalog reads 
the current metadata for all requested tables in a single consistent operation 
(e.g., a TransactGetItems in DynamoDB, or a single SI read in a relational DB). 
The client receives a consistent cross-table snapshot — the latest committed 
state of all requested tables as of a single point in time.

This would give us the statement level global snapshot consistency. It doesn’t 
provide full transaction level SI consistency for multi statement transactions, 
but I believe it’s a reasonable trade off.

I capture the details of this proposal in this doc - 
https://docs.google.com/document/d/1u11b4pzeFUKD0XX--nHPj-DoYcNeCgOe94WKCaX2XMI/edit?usp=sharing
 

I also created a prototype that implements the LoadTables API for Apache 
Polaris, levering the underlying Postgres for the snapshot isolation - 
https://github.com/xndai/polaris/commit/f4eb514a2920effe67ecfb8c64e2e3fa418baf11

Feedbacks and comments are welcomed!

Reply via email to