Hello everyone, I’d like to propose a change to how we handle the serialization of bound expressions and transforms. Presently the Expressions and the transforms are when serialized they are first converted to an unbound expression / transform which reference stuff by name, and then we look up stuff by name when we deserialize and want to bind them back to a given schema.
Having a way to serialize these bound expressions / transforms would be really helpful in cases such as Row Access Policy where the catalog returns back an expression which needs to enforced by the engine, for that it's important for the catalog to give back bound expression to protect the cases of column rename / drop (in case reading old snapshot) as the information which schema the client would be reading the table is not know. I have a proposal which introduces a notion of IDReference which apart from name includes field ID of the columns and when this info is Serialized in the representation, so that we can deserialize it back correctly. This would come really handy in cases such as Read Restrictions spec here : - [SPEC] Add finer grained read restrictions as part of loadTable #13879 <https://github.com/apache/iceberg/pull/13879> We have discussed it a couple of times in the Catalog community Syncs : - https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVgpYN5vz5ohO8PKw/edit?tab=t.0#heading=h.cr6o1g2rn5hc I have a working PR with IRC spec and api changes for the same for some time now to demonstrate how it would work E2E, it would be really nice to get your eyes / feedback on it. Best, Prashant Singh
