Agree that table comment/description belongs in table metadata. +1 to documenting comment as the standard convention (PR #15367 <https://github.com/apache/iceberg/pull/15367>)
On Fri, Feb 20, 2026 at 9:27 AM Steven Wu <[email protected]> wrote: > I think the table description or comment belongs in the table metadata. It > should be updated infrequently. I am not too worried about the table commit. > > On Fri, Feb 20, 2026 at 8:13 AM Ryan Blue <[email protected]> wrote: > >> You're right that this would require a table commit, but that's the case >> for almost all other parts of table metadata, including if we were to add a >> doc field to schemas. We could handle this entirely at the catalog level, >> but then it would be difficult to pass the data to engines to display. >> >> That said, there is other catalog metadata, like `owner`, that we don't >> track in the table and don't recommend using a table property for, so >> there's room to have additional catalog-tracked metadata fields passed to >> REST clients. >> >> On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote: >> >>> I've been thinking about this particular use case lately. One drawback >>> of using the doc or comment property in the Iceberg table metadata is that >>> updates fall on the table commit path; meaning any update to a comment >>> will trigger the creation of an additional table snapshot. I think this >>> side effect is worth documenting. >>> >>> Another option for supporting this use case would be to leave it to the >>> catalogs to co-locate "business metadata" with the table. I've raised a >>> discussion with the Polaris community [1]. >>> >>> Best, >>> Kevin Liu >>> >>> >>> [1] https://github.com/apache/polaris/issues/3222 >>> >>> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev < >>> [email protected]> wrote: >>> >>>> Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367 >>>> >>>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote: >>>> >>>>> It seems that we have a consensus to standardize and document the >>>>> "comment" table properties. It is useful to provide the semantic context >>>>> that is super helpful to LLMs. This is also how popular engines like Spark >>>>> and Trino store the `comment` string from "CREATE TABLE" DDL. >>>>> >>>>> Taeyu/Guy, let us know if you are interested in creating a PR for that. >>>>> >>>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote: >>>>> >>>>>> I think it's probably a good idea to add more implementation-specific >>>>>> details to the spec, like the use of "comment" for table documentation. >>>>>> We >>>>>> recently added a section for this that is clear that these are not >>>>>> required >>>>>> but are important conventions. >>>>>> >>>>>> I would not add "owner" to that section. Storing owner in table >>>>>> properties is not a good idea because it would either need to be >>>>>> controlled >>>>>> and overridden by catalogs or would be informational and untrustworthy. I >>>>>> think that owner is part of catalog metadata, not table metadata. >>>>>> >>>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Got it - I now understand better the meaning of "reserved table >>>>>>> properties", and I agree it shouldn't be touched or expanded. >>>>>>> >>>>>>> Going back to the original topic: >>>>>>> It appears that both `comment` and `owner` are important fields, >>>>>>> which are populated by some engines, and can prove useful for others, >>>>>>> but >>>>>>> aren't standardized anywhere in the spec. >>>>>>> To improve engine alignment, I think they should be documented >>>>>>> somewhere. >>>>>>> I'd suggest one of two approaches: >>>>>>> >>>>>>> 1. Either keeping them in the table properties map, and >>>>>>> documenting it in the Table Properties documentation >>>>>>> >>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties> >>>>>>> (but >>>>>>> not in the reserved section - perhaps it deserves its own section, >>>>>>> "Table >>>>>>> context properties"?) >>>>>>> 2. Or adding them as optional top-level fields in the >>>>>>> metadata.json schema - this might be the "best practice" (especially >>>>>>> if >>>>>>> `owner` is supposed to be controlled by the catalog). However, it >>>>>>> will >>>>>>> require changing the current behavior of Spark, both for `owner` >>>>>>> assignment, and for `comment` assignment in "CREATE TABLE ... COMMENT >>>>>>> 'table documentation'". >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote: >>>>>>> >>>>>>>> The `format-version` table property is different because it is >>>>>>>> mapped to the format version that is not stored in table properties. >>>>>>>> It is >>>>>>>> reserved because implementations will override it and so it isn't a >>>>>>>> real >>>>>>>> table property. This is not a pattern that we want to expand because >>>>>>>> of the >>>>>>>> strange behavior. >>>>>>>> >>>>>>>> For cases like `comment`, these other properties are normal table >>>>>>>> properties that can be used like any other. If the schema had a doc >>>>>>>> string >>>>>>>> and that was used in place of `comment`, then I think it would be a >>>>>>>> reserved property. But there's no need for that because setting the >>>>>>>> property or using `COMMENT ON` would have the same behavior -- >>>>>>>> changing the >>>>>>>> property value. >>>>>>>> >>>>>>>> The `owner` property is a different case. Owner is something that >>>>>>>> should be restricted. A user should not be able to change it with just >>>>>>>> access to modify table metadata. Tracking a table's owner is the >>>>>>>> responsibility of the catalog and its access control scheme. Because of >>>>>>>> this, I don't think that we should standardize or encourage setting an >>>>>>>> `owner` table property. >>>>>>>> >>>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor >>>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>>> If using "comment" is the best practice, should we add this to the >>>>>>>>> "reserved >>>>>>>>> table properties" docs >>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>, >>>>>>>>> to make sure it's aligned between different engines and >>>>>>>>> implementations? >>>>>>>>> In the same opportunity, I would suggest adding "owner" as >>>>>>>>> well, which is automatically added by Spark. >>>>>>>>> >>>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I see, thank you for your response. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Taeyun >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>> To: <[email protected]>; >>>>>>>>>> Cc: >>>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00) >>>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>> Objects >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If there isn't a significant difference between table-level >>>>>>>>>> description and schema-level description, then I think you should >>>>>>>>>> consider >>>>>>>>>> it standardized. You can store the table description in the >>>>>>>>>> "comment" table >>>>>>>>>> property. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I’ve already explained my reasoning in earlier messages, >>>>>>>>>> including the example about making table and column descriptions more >>>>>>>>>> accessible for LLM‑generated SQL. >>>>>>>>>> From my perspective, table‑level comments, like column‑level >>>>>>>>>> comments, should also be standardized. >>>>>>>>>> If standardized, it seems natural for them to be part of the >>>>>>>>>> schema definition, just like column‑level comments. >>>>>>>>>> This way, they stay consistent with the schema version and avoid >>>>>>>>>> drifting out of sync when the schema changes. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Taeyun >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>> To: <[email protected]>; >>>>>>>>>> Cc: >>>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00) >>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>> Objects >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Why would you need to version table descriptions? Are there cases >>>>>>>>>> where they are changing rapidly and inaccurate due to schema changes? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Thank you for your reply. >>>>>>>>>> >>>>>>>>>> Column-level comments are already part of the schema definition. >>>>>>>>>> Would adding just one table-level comment really cause noticeable >>>>>>>>>> bloat? >>>>>>>>>> For example, if a table has 20 columns, adding one more comment >>>>>>>>>> would only >>>>>>>>>> increase the metadata size by about 1/20th. >>>>>>>>>> >>>>>>>>>> Also, using schema-id as part of the property key feels like a >>>>>>>>>> workaround rather than a proper solution. It is not part of the >>>>>>>>>> specification, so any tool or integration (including LLM-based ones) >>>>>>>>>> would >>>>>>>>>> need extra logic to interpret it. A standardized, schema-level field >>>>>>>>>> would >>>>>>>>>> avoid that complexity and make the metadata easier to consume >>>>>>>>>> consistently. >>>>>>>>>> >>>>>>>>>> If bloat is a real concern, perhaps column-level comments should >>>>>>>>>> also be moved out of the schema, with a proper mechanism to version >>>>>>>>>> and >>>>>>>>>> manage them separately. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Taeyun. >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: "Gang Wu" <[email protected]> >>>>>>>>>> To: <[email protected]>; >>>>>>>>>> Cc: >>>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) >>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>> Objects >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'd rather not complicate the schema definitions in the table >>>>>>>>>> metadata. You may append `schema-id` to the key of table property to >>>>>>>>>> manage >>>>>>>>>> different schema versions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Storing verbose text to each field may bloat the metadata >>>>>>>>>> storage, especially when there are a lot of duplicate `doc`s if >>>>>>>>>> schema >>>>>>>>>> evolution happens a lot. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Gang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Thank you for your response. >>>>>>>>>> As I understand it, the table description is currently stored as >>>>>>>>>> a table property within the table metadata’s `properties` map. >>>>>>>>>> >>>>>>>>>> In my opinion, this approach has a few issues: >>>>>>>>>> >>>>>>>>>> - Table metadata `properties` are not versioned. As a result, >>>>>>>>>> when querying an older snapshot, the description may be inaccurate >>>>>>>>>> because >>>>>>>>>> the value reflects only the current state. >>>>>>>>>> - According to the specification, the purpose of table metadata >>>>>>>>>> properties is: “A string to string map of table properties. This is >>>>>>>>>> used to >>>>>>>>>> control settings that affect reading and writing and is not intended >>>>>>>>>> to be >>>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall >>>>>>>>>> under >>>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of >>>>>>>>>> properties. >>>>>>>>>> - Table comments seem to have become significant enough that >>>>>>>>>> relying on a convention alone may no longer be sufficient. It might >>>>>>>>>> be >>>>>>>>>> worth considering a standardized, schema-level field for them. >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> Taeyun >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: "Ryan Blue" <[email protected]> >>>>>>>>>> To: <[email protected]>; >>>>>>>>>> Cc: >>>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) >>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema >>>>>>>>>> Objects >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Iceberg does allow you to store table descriptions. The >>>>>>>>>> convention is to use a table property, "comment". While this isn't a >>>>>>>>>> schema-level doc/comment, I don't know of anything that makes a >>>>>>>>>> distinction between schema description and table description, so I >>>>>>>>>> think it >>>>>>>>>> should work for your use. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> With the growing trend of using LLMs to automatically generate >>>>>>>>>> SQL, it feels increasingly important to manage descriptions of >>>>>>>>>> database >>>>>>>>>> tables and columns in a way that these tools can easily access. >>>>>>>>>> >>>>>>>>>> In the Iceberg specification, comments for schema fields (i.e., >>>>>>>>>> columns) can be specified using the `doc` property within the >>>>>>>>>> `fields` >>>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to >>>>>>>>>> specify a comment for the root struct type itself - that is, for the >>>>>>>>>> table >>>>>>>>>> as a whole. >>>>>>>>>> >>>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level >>>>>>>>>> comments by storing them in the `properties` map within the table >>>>>>>>>> metadata >>>>>>>>>> under various non-standard keys. But since a table comment >>>>>>>>>> conceptually >>>>>>>>>> belongs to the schema, and can vary by schema, it feels like the >>>>>>>>>> `properties` map within the table metadata might not be the best >>>>>>>>>> place for >>>>>>>>>> it. >>>>>>>>>> >>>>>>>>>> Would it make sense to allow a `doc` property on the `schema` >>>>>>>>>> object (the root struct type), alongside `schema-id` and >>>>>>>>>> `identifier-field-ids`, so that a description for the schema itself >>>>>>>>>> can be >>>>>>>>>> included? >>>>>>>>>> It seems like it would be helpful, especially for tooling and >>>>>>>>>> LLM-related use cases. >>>>>>>>>> >>>>>>>>>> Curious to hear your thoughts. >>>>>>>>>> Apologies if I’m overlooking something or if this has already >>>>>>>>>> been discussed. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Taeyun >>>>>>>>> >>>>>>>>>
