Agree that table comment/description belongs in table metadata. +1 to
documenting comment as the standard convention (PR #15367
<https://github.com/apache/iceberg/pull/15367>)

On Fri, Feb 20, 2026 at 9:27 AM Steven Wu <[email protected]> wrote:

> I think the table description or comment belongs in the table metadata. It
> should be updated infrequently. I am not too worried about the table commit.
>
> On Fri, Feb 20, 2026 at 8:13 AM Ryan Blue <[email protected]> wrote:
>
>> You're right that this would require a table commit, but that's the case
>> for almost all other parts of table metadata, including if we were to add a
>> doc field to schemas. We could handle this entirely at the catalog level,
>> but then it would be difficult to pass the data to engines to display.
>>
>> That said, there is other catalog metadata, like `owner`, that we don't
>> track in the table and don't recommend using a table property for, so
>> there's room to have additional catalog-tracked metadata fields passed to
>> REST clients.
>>
>> On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote:
>>
>>> I've been thinking about this particular use case lately. One drawback
>>> of using the doc or comment property in the Iceberg table metadata is that
>>> updates fall on the table commit path;  meaning any update to a comment
>>> will trigger the creation of an additional table snapshot. I think this
>>> side effect is worth documenting.
>>>
>>> Another option for supporting this use case would be to leave it to the
>>> catalogs to co-locate "business metadata" with the table. I've raised a
>>> discussion with the Polaris community [1].
>>>
>>> Best,
>>> Kevin Liu
>>>
>>>
>>> [1] https://github.com/apache/polaris/issues/3222
>>>
>>> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev <
>>> [email protected]> wrote:
>>>
>>>> Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367
>>>>
>>>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote:
>>>>
>>>>> It seems that we have a consensus to standardize and document the
>>>>> "comment" table properties. It is useful to provide the semantic context
>>>>> that is super helpful to LLMs. This is also how popular engines like Spark
>>>>> and Trino store the `comment` string from "CREATE TABLE" DDL.
>>>>>
>>>>> Taeyu/Guy, let us know if you are interested in creating a PR for that.
>>>>>
>>>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote:
>>>>>
>>>>>> I think it's probably a good idea to add more implementation-specific
>>>>>> details to the spec, like the use of "comment" for table documentation. 
>>>>>> We
>>>>>> recently added a section for this that is clear that these are not 
>>>>>> required
>>>>>> but are important conventions.
>>>>>>
>>>>>> I would not add "owner" to that section. Storing owner in table
>>>>>> properties is not a good idea because it would either need to be 
>>>>>> controlled
>>>>>> and overridden by catalogs or would be informational and untrustworthy. I
>>>>>> think that owner is part of catalog metadata, not table metadata.
>>>>>>
>>>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Got it - I now understand better the meaning of "reserved table
>>>>>>> properties", and I agree it shouldn't be touched or expanded.
>>>>>>>
>>>>>>> Going back to the original topic:
>>>>>>> It appears that both `comment` and `owner` are important fields,
>>>>>>> which are populated by some engines, and can prove useful for others, 
>>>>>>> but
>>>>>>> aren't standardized anywhere in the spec.
>>>>>>> To improve engine alignment, I think they should be documented
>>>>>>> somewhere.
>>>>>>> I'd suggest one of two approaches:
>>>>>>>
>>>>>>>    1. Either keeping them in the table properties map, and
>>>>>>>    documenting it in the Table Properties documentation
>>>>>>>    
>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties>
>>>>>>>  (but
>>>>>>>    not in the reserved section - perhaps it deserves its own section, 
>>>>>>> "Table
>>>>>>>    context properties"?)
>>>>>>>    2. Or adding them as optional top-level fields in the
>>>>>>>    metadata.json schema - this might be the "best practice" (especially 
>>>>>>> if
>>>>>>>    `owner` is supposed to be controlled by the catalog). However, it 
>>>>>>> will
>>>>>>>    require changing the current behavior of Spark, both for `owner`
>>>>>>>    assignment, and for `comment` assignment in "CREATE TABLE ... COMMENT
>>>>>>>    'table documentation'".
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote:
>>>>>>>
>>>>>>>> The `format-version` table property is different because it is
>>>>>>>> mapped to the format version that is not stored in table properties. 
>>>>>>>> It is
>>>>>>>> reserved because implementations will override it and so it isn't a 
>>>>>>>> real
>>>>>>>> table property. This is not a pattern that we want to expand because 
>>>>>>>> of the
>>>>>>>> strange behavior.
>>>>>>>>
>>>>>>>> For cases like `comment`, these other properties are normal table
>>>>>>>> properties that can be used like any other. If the schema had a doc 
>>>>>>>> string
>>>>>>>> and that was used in place of `comment`, then I think it would be a
>>>>>>>> reserved property. But there's no need for that because setting the
>>>>>>>> property or using `COMMENT ON` would have the same behavior -- 
>>>>>>>> changing the
>>>>>>>> property value.
>>>>>>>>
>>>>>>>> The `owner` property is a different case. Owner is something that
>>>>>>>> should be restricted. A user should not be able to change it with just
>>>>>>>> access to modify table metadata. Tracking a table's owner is the
>>>>>>>> responsibility of the catalog and its access control scheme. Because of
>>>>>>>> this, I don't think that we should standardize or encourage setting an
>>>>>>>> `owner` table property.
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> If using "comment" is the best practice, should we add this to the 
>>>>>>>>> "reserved
>>>>>>>>> table properties" docs
>>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>>>>>>>> to make sure it's aligned between different engines and 
>>>>>>>>> implementations?
>>>>>>>>> In the same opportunity, I would suggest adding "owner" as
>>>>>>>>> well, which is automatically added by Spark.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I see, thank you for your response.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Taeyun
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>> Cc:
>>>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>> Objects
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If there isn't a significant difference between table-level
>>>>>>>>>> description and schema-level description, then I think you should 
>>>>>>>>>> consider
>>>>>>>>>> it standardized. You can store the table description in the 
>>>>>>>>>> "comment" table
>>>>>>>>>> property.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I’ve already explained my reasoning in earlier messages,
>>>>>>>>>> including the example about making table and column descriptions more
>>>>>>>>>> accessible for LLM‑generated SQL.
>>>>>>>>>> From my perspective, table‑level comments, like column‑level
>>>>>>>>>> comments, should also be standardized.
>>>>>>>>>> If standardized, it seems natural for them to be part of the
>>>>>>>>>> schema definition, just like column‑level comments.
>>>>>>>>>> This way, they stay consistent with the schema version and avoid
>>>>>>>>>> drifting out of sync when the schema changes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Taeyun
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>> Cc:
>>>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>> Objects
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why would you need to version table descriptions? Are there cases
>>>>>>>>>> where they are changing rapidly and inaccurate due to schema changes?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Thank you for your reply.
>>>>>>>>>>
>>>>>>>>>> Column-level comments are already part of the schema definition.
>>>>>>>>>> Would adding just one table-level comment really cause noticeable 
>>>>>>>>>> bloat?
>>>>>>>>>> For example, if a table has 20 columns, adding one more comment 
>>>>>>>>>> would only
>>>>>>>>>> increase the metadata size by about 1/20th.
>>>>>>>>>>
>>>>>>>>>> Also, using schema-id as part of the property key feels like a
>>>>>>>>>> workaround rather than a proper solution. It is not part of the
>>>>>>>>>> specification, so any tool or integration (including LLM-based ones) 
>>>>>>>>>> would
>>>>>>>>>> need extra logic to interpret it. A standardized, schema-level field 
>>>>>>>>>> would
>>>>>>>>>> avoid that complexity and make the metadata easier to consume 
>>>>>>>>>> consistently.
>>>>>>>>>>
>>>>>>>>>> If bloat is a real concern, perhaps column-level comments should
>>>>>>>>>> also be moved out of the schema, with a proper mechanism to version 
>>>>>>>>>> and
>>>>>>>>>> manage them separately.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Taeyun.
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "Gang Wu" <[email protected]>
>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>> Cc:
>>>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>> Objects
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I'd rather not complicate the schema definitions in the table
>>>>>>>>>> metadata. You may append `schema-id` to the key of table property to 
>>>>>>>>>> manage
>>>>>>>>>> different schema versions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Storing verbose text to each field may bloat the metadata
>>>>>>>>>> storage, especially when there are a lot of duplicate `doc`s if 
>>>>>>>>>> schema
>>>>>>>>>> evolution happens a lot.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Gang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Thank you for your response.
>>>>>>>>>> As I understand it, the table description is currently stored as
>>>>>>>>>> a table property within the table metadata’s `properties` map.
>>>>>>>>>>
>>>>>>>>>> In my opinion, this approach has a few issues:
>>>>>>>>>>
>>>>>>>>>> - Table metadata `properties` are not versioned. As a result,
>>>>>>>>>> when querying an older snapshot, the description may be inaccurate 
>>>>>>>>>> because
>>>>>>>>>> the value reflects only the current state.
>>>>>>>>>> - According to the specification, the purpose of table metadata
>>>>>>>>>> properties is: “A string to string map of table properties. This is 
>>>>>>>>>> used to
>>>>>>>>>> control settings that affect reading and writing and is not intended 
>>>>>>>>>> to be
>>>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall 
>>>>>>>>>> under
>>>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>>>>>>>> properties.
>>>>>>>>>> - Table comments seem to have become significant enough that
>>>>>>>>>> relying on a convention alone may no longer be sufficient. It might 
>>>>>>>>>> be
>>>>>>>>>> worth considering a standardized, schema-level field for them.
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>> Taeyun
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>>> To: <[email protected]>;
>>>>>>>>>> Cc:
>>>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>>> Objects
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Iceberg does allow you to store table descriptions. The
>>>>>>>>>> convention is to use a table property, "comment". While this isn't a
>>>>>>>>>> schema-level doc/comment, I don't know of anything that makes a
>>>>>>>>>> distinction between schema description and table description, so I 
>>>>>>>>>> think it
>>>>>>>>>> should work for your use.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> With the growing trend of using LLMs to automatically generate
>>>>>>>>>> SQL, it feels increasingly important to manage descriptions of 
>>>>>>>>>> database
>>>>>>>>>> tables and columns in a way that these tools can easily access.
>>>>>>>>>>
>>>>>>>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>>>>>>>> columns) can be specified using the `doc` property within the 
>>>>>>>>>> `fields`
>>>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>>>>>>>> specify a comment for the root struct type itself - that is, for the 
>>>>>>>>>> table
>>>>>>>>>> as a whole.
>>>>>>>>>>
>>>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level
>>>>>>>>>> comments by storing them in the `properties` map within the table 
>>>>>>>>>> metadata
>>>>>>>>>> under various non-standard keys. But since a table comment 
>>>>>>>>>> conceptually
>>>>>>>>>> belongs to the schema, and can vary by schema, it feels like the
>>>>>>>>>> `properties` map within the table metadata might not be the best 
>>>>>>>>>> place for
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>> Would it make sense to allow a `doc` property on the `schema`
>>>>>>>>>> object (the root struct type), alongside `schema-id` and
>>>>>>>>>> `identifier-field-ids`, so that a description for the schema itself 
>>>>>>>>>> can be
>>>>>>>>>> included?
>>>>>>>>>> It seems like it would be helpful, especially for tooling and
>>>>>>>>>> LLM-related use cases.
>>>>>>>>>>
>>>>>>>>>> Curious to hear your thoughts.
>>>>>>>>>> Apologies if I’m overlooking something or if this has already
>>>>>>>>>> been discussed.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Taeyun
>>>>>>>>>
>>>>>>>>>

Reply via email to