Hello everyone,

I’d like to propose a change to how we handle the serialization of bound
expressions and transforms. Presently the Expressions and the transforms
are when serialized they are first converted to an unbound expression /
transform which reference stuff by name, and then we look up stuff by name
when we deserialize and want to bind them back to a given schema.

Having a way to serialize these bound expressions / transforms would be
really helpful in cases such as Row Access Policy where the catalog returns
back an expression which needs to enforced by the engine, for that it's
important for the catalog to give back bound expression to protect the
cases of column rename / drop (in case reading old snapshot) as the
information which schema the client would be reading the table is not know.

I have a proposal which introduces a notion of IDReference which apart from
name includes field ID of the columns and when this info is Serialized in
the representation, so that we can deserialize it back correctly.

This would come really handy in cases such as

Read Restrictions spec here :

   - [SPEC] Add finer grained read restrictions as part of loadTable #13879
   <https://github.com/apache/iceberg/pull/13879>

We have discussed it a couple of times in the Catalog community Syncs :

   -
   
https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVgpYN5vz5ohO8PKw/edit?tab=t.0#heading=h.cr6o1g2rn5hc

I have a working PR with IRC spec and api changes for the same for some
time now to demonstrate how it would work E2E, it would be really nice to
get your eyes / feedback on it.

Best,

Prashant Singh

Reply via email to