Hi,

We would like to store snapshot metadata that is necessary for 
producing/consuming incremental data. An example of this is the maximum value 
of an event timeline that we have processed so far, so that we know where to 
read from next. 

Some of the possible options that we have discovered so far are:

1) to store such metadata in the TableMetadata properties, but this is already 
advised against in the Iceberg specification.

2) to use the max of the upper bounds of an event timestamp column tracked by 
the Datafiles in a snapshot, but this wouldn’t be accurate as we can have cases 
where the max value of an event timestamp column is less than the event time 
for which data spans (especially for sparse datasets).

3) to store such metadata in the summary property of the snapshot. This seems 
to be the most promising approach, but we wanted to know if there are any 
restrictions on the maximum length of information that can be stored in the 
summary property of a Snapshot. A downside to this approach is that the summary 
property of the snapshot only holds Strings, so we will have to always convert 
all data to Strings in order to use this.

If none of the above is the most suitable place to store such information, 
please could anyone advise any other approaches they have taken to solve this?

Thanks,
Dabby

Reply via email to