1. what is your use case for ordered primary key? We explored this option but discarded it because ordering is mostly used for secondary indexing, but that should instead leverage information about sort order and partition key in Iceberg. For the upsert use case, ordering is not needed.
2. Yes the read and write properties are defined in https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableProperties.java, but I think there is nothing preventing you from adding custom property to that map, because there is a public API for that operation, and this should not be a bottleneck for scalability as long as you have a bounded number of custom properties. But let's see how other people reacts to my suggestion here, I was not aware that there is an expected usage of table properties to only read and write configs. 3. This sounds to me like a tagging feature, which could be natively supported by Iceberg or be hooked to an external system to achieve the goal. Currently there is no public API for updating that, so I would not suggest leveraging the field. I am currently leaning towards not supporting this natively in Iceberg, because (1) snapshot information should be immutable once written, (2) Iceberg handles the evolution of table in a timeline of commits, but this feature is trying to annotate the historical commits which feels out of the Iceberg scope to me. It should be easy enough for you to hook it to an external key-value storage where the key is tableId + snapshotId, and value is the property map that you want to manage given the immutable nature of table id and historical snapshot ids. Let's see how other people think about this. -Jack On Fri, May 14, 2021 at 10:31 AM QH Yan <[email protected]> wrote: > Thanks for the reply Jack! > > 1. That is great to have! > However, we might also like to preserve order among the key columns. > What's the reason that it is a set? > > 2. Sorry maybe I wasn't clear about the use-case. > We are working to extend Iceberg for our users. Here I meant metadata that > is of our users' interest but not related to Iceberg's functionality. For > example, the table owner may want to note that "Table A is about my > research on topic B, and it should be published to group C after 2022Q1". > This is not a P0 requirement to us and I just want to learn in case it is > available. > It seems to me that *table.properties* aren't intended for it according > to "This is used to control settings that affect reading and writing and > is not intended to be used for arbitrary metadata. > <https://iceberg.apache.org/spec/#table-metadata-fields>" > > 3. Naming snapshot is a feature that we want to support. > For example, it is convenient for a user who often time-travel to a weekly > checkpoint which is named by formatted-date-string instead of a random int. > This feature is also related to compaction, which was discussed in a > previous thread and meeting, that we hope to compact and replace a > named Snapshot so that the time-travel reader gets the better performance, > which means the SnapshotSummary could contain information about compaction. > Let alone the compaction (I know the current proposal of compaction > doesn't work in that way), does it sound reasonable to add an optional > "name" field in the SnapshotSummary for this? > > > On Fri, May 14, 2021 at 12:16 PM Jack Ye <[email protected]> wrote: > >> 1. The primary key concept is now added to Iceberg as the identifier >> concept in schema. I am in the progress of adding the documentation. You >> can read the javadoc for more details for now: >> https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Schema.java#L189-L210 >> >> 2. Yes, properties is the one for user metadata. >> >> 3. Why do you want to store additional configs and names for snapshot? >> The snapshot.summary field is written by engine and has these defined >> fields: >> https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java >> >> -Jack >> >> On Fri, May 14, 2021 at 8:49 AM QH Yan <[email protected]> wrote: >> >>> Hi there, >>> We want to store 3 different kinds of metadata/config in Iceberg tables. >>> >>> 1. Additional settings/admin properties for a table (e.g. PrimaryKey >>> info) >>> I think it is* table.properties* according to here >>> <https://iceberg.apache.org/spec/#table-metadata-fields> and would like >>> to confirm. >>> >>> 2. User metadata at table level. >>> Sometimes user wants to take notes about a table. Is there a field for >>> this? (map of String is good enough and I don't want to abuse >>> table.properties also as the doc points out) >>> >>> 3. Snapshot name and additional configs >>> Seems that* snapshot.summary* is for both of these according to here >>> <https://iceberg.apache.org/spec/#snapshots>, am I right? >>> >>> Thank you! >>> >>> -- >>> *Qinhua* >>> >>> > > -- > *Qinhua* > >
