I think the table description or comment belongs in the table metadata. It
should be updated infrequently. I am not too worried about the table commit.

On Fri, Feb 20, 2026 at 8:13 AM Ryan Blue <[email protected]> wrote:

> You're right that this would require a table commit, but that's the case
> for almost all other parts of table metadata, including if we were to add a
> doc field to schemas. We could handle this entirely at the catalog level,
> but then it would be difficult to pass the data to engines to display.
>
> That said, there is other catalog metadata, like `owner`, that we don't
> track in the table and don't recommend using a table property for, so
> there's room to have additional catalog-tracked metadata fields passed to
> REST clients.
>
> On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote:
>
>> I've been thinking about this particular use case lately. One drawback of
>> using the doc or comment property in the Iceberg table metadata is that
>> updates fall on the table commit path;  meaning any update to a comment
>> will trigger the creation of an additional table snapshot. I think this
>> side effect is worth documenting.
>>
>> Another option for supporting this use case would be to leave it to the
>> catalogs to co-locate "business metadata" with the table. I've raised a
>> discussion with the Polaris community [1].
>>
>> Best,
>> Kevin Liu
>>
>>
>> [1] https://github.com/apache/polaris/issues/3222
>>
>> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev <
>> [email protected]> wrote:
>>
>>> Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367
>>>
>>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote:
>>>
>>>> It seems that we have a consensus to standardize and document the
>>>> "comment" table properties. It is useful to provide the semantic context
>>>> that is super helpful to LLMs. This is also how popular engines like Spark
>>>> and Trino store the `comment` string from "CREATE TABLE" DDL.
>>>>
>>>> Taeyu/Guy, let us know if you are interested in creating a PR for that.
>>>>
>>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote:
>>>>
>>>>> I think it's probably a good idea to add more implementation-specific
>>>>> details to the spec, like the use of "comment" for table documentation. We
>>>>> recently added a section for this that is clear that these are not 
>>>>> required
>>>>> but are important conventions.
>>>>>
>>>>> I would not add "owner" to that section. Storing owner in table
>>>>> properties is not a good idea because it would either need to be 
>>>>> controlled
>>>>> and overridden by catalogs or would be informational and untrustworthy. I
>>>>> think that owner is part of catalog metadata, not table metadata.
>>>>>
>>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Got it - I now understand better the meaning of "reserved table
>>>>>> properties", and I agree it shouldn't be touched or expanded.
>>>>>>
>>>>>> Going back to the original topic:
>>>>>> It appears that both `comment` and `owner` are important fields,
>>>>>> which are populated by some engines, and can prove useful for others, but
>>>>>> aren't standardized anywhere in the spec.
>>>>>> To improve engine alignment, I think they should be documented
>>>>>> somewhere.
>>>>>> I'd suggest one of two approaches:
>>>>>>
>>>>>>    1. Either keeping them in the table properties map, and
>>>>>>    documenting it in the Table Properties documentation
>>>>>>    
>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties> 
>>>>>> (but
>>>>>>    not in the reserved section - perhaps it deserves its own section, 
>>>>>> "Table
>>>>>>    context properties"?)
>>>>>>    2. Or adding them as optional top-level fields in the
>>>>>>    metadata.json schema - this might be the "best practice" (especially 
>>>>>> if
>>>>>>    `owner` is supposed to be controlled by the catalog). However, it will
>>>>>>    require changing the current behavior of Spark, both for `owner`
>>>>>>    assignment, and for `comment` assignment in "CREATE TABLE ... COMMENT
>>>>>>    'table documentation'".
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote:
>>>>>>
>>>>>>> The `format-version` table property is different because it is
>>>>>>> mapped to the format version that is not stored in table properties. It 
>>>>>>> is
>>>>>>> reserved because implementations will override it and so it isn't a real
>>>>>>> table property. This is not a pattern that we want to expand because of 
>>>>>>> the
>>>>>>> strange behavior.
>>>>>>>
>>>>>>> For cases like `comment`, these other properties are normal table
>>>>>>> properties that can be used like any other. If the schema had a doc 
>>>>>>> string
>>>>>>> and that was used in place of `comment`, then I think it would be a
>>>>>>> reserved property. But there's no need for that because setting the
>>>>>>> property or using `COMMENT ON` would have the same behavior -- changing 
>>>>>>> the
>>>>>>> property value.
>>>>>>>
>>>>>>> The `owner` property is a different case. Owner is something that
>>>>>>> should be restricted. A user should not be able to change it with just
>>>>>>> access to modify table metadata. Tracking a table's owner is the
>>>>>>> responsibility of the catalog and its access control scheme. Because of
>>>>>>> this, I don't think that we should standardize or encourage setting an
>>>>>>> `owner` table property.
>>>>>>>
>>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If using "comment" is the best practice, should we add this to the 
>>>>>>>> "reserved
>>>>>>>> table properties" docs
>>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>>>>>>> to make sure it's aligned between different engines and 
>>>>>>>> implementations?
>>>>>>>> In the same opportunity, I would suggest adding "owner" as
>>>>>>>> well, which is automatically added by Spark.
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I see, thank you for your response.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Taeyun
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>> To: <[email protected]>;
>>>>>>>>> Cc:
>>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>>> Objects
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If there isn't a significant difference between table-level
>>>>>>>>> description and schema-level description, then I think you should 
>>>>>>>>> consider
>>>>>>>>> it standardized. You can store the table description in the "comment" 
>>>>>>>>> table
>>>>>>>>> property.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I’ve already explained my reasoning in earlier messages, including
>>>>>>>>> the example about making table and column descriptions more 
>>>>>>>>> accessible for
>>>>>>>>> LLM‑generated SQL.
>>>>>>>>> From my perspective, table‑level comments, like column‑level
>>>>>>>>> comments, should also be standardized.
>>>>>>>>> If standardized, it seems natural for them to be part of the
>>>>>>>>> schema definition, just like column‑level comments.
>>>>>>>>> This way, they stay consistent with the schema version and avoid
>>>>>>>>> drifting out of sync when the schema changes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Taeyun
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>> To: <[email protected]>;
>>>>>>>>> Cc:
>>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Why would you need to version table descriptions? Are there cases
>>>>>>>>> where they are changing rapidly and inaccurate due to schema changes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Thank you for your reply.
>>>>>>>>>
>>>>>>>>> Column-level comments are already part of the schema definition.
>>>>>>>>> Would adding just one table-level comment really cause noticeable 
>>>>>>>>> bloat?
>>>>>>>>> For example, if a table has 20 columns, adding one more comment would 
>>>>>>>>> only
>>>>>>>>> increase the metadata size by about 1/20th.
>>>>>>>>>
>>>>>>>>> Also, using schema-id as part of the property key feels like a
>>>>>>>>> workaround rather than a proper solution. It is not part of the
>>>>>>>>> specification, so any tool or integration (including LLM-based ones) 
>>>>>>>>> would
>>>>>>>>> need extra logic to interpret it. A standardized, schema-level field 
>>>>>>>>> would
>>>>>>>>> avoid that complexity and make the metadata easier to consume 
>>>>>>>>> consistently.
>>>>>>>>>
>>>>>>>>> If bloat is a real concern, perhaps column-level comments should
>>>>>>>>> also be moved out of the schema, with a proper mechanism to version 
>>>>>>>>> and
>>>>>>>>> manage them separately.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Taeyun.
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "Gang Wu" <[email protected]>
>>>>>>>>> To: <[email protected]>;
>>>>>>>>> Cc:
>>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'd rather not complicate the schema definitions in the table
>>>>>>>>> metadata. You may append `schema-id` to the key of table property to 
>>>>>>>>> manage
>>>>>>>>> different schema versions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Storing verbose text to each field may bloat the metadata storage,
>>>>>>>>> especially when there are a lot of duplicate `doc`s if schema 
>>>>>>>>> evolution
>>>>>>>>> happens a lot.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Gang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Thank you for your response.
>>>>>>>>> As I understand it, the table description is currently stored as a
>>>>>>>>> table property within the table metadata’s `properties` map.
>>>>>>>>>
>>>>>>>>> In my opinion, this approach has a few issues:
>>>>>>>>>
>>>>>>>>> - Table metadata `properties` are not versioned. As a result, when
>>>>>>>>> querying an older snapshot, the description may be inaccurate because 
>>>>>>>>> the
>>>>>>>>> value reflects only the current state.
>>>>>>>>> - According to the specification, the purpose of table metadata
>>>>>>>>> properties is: “A string to string map of table properties. This is 
>>>>>>>>> used to
>>>>>>>>> control settings that affect reading and writing and is not intended 
>>>>>>>>> to be
>>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall 
>>>>>>>>> under
>>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>>>>>>> properties.
>>>>>>>>> - Table comments seem to have become significant enough that
>>>>>>>>> relying on a convention alone may no longer be sufficient. It might be
>>>>>>>>> worth considering a standardized, schema-level field for them.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>> Taeyun
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>>> To: <[email protected]>;
>>>>>>>>> Cc:
>>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Iceberg does allow you to store table descriptions. The convention
>>>>>>>>> is to use a table property, "comment". While this isn't a schema-level
>>>>>>>>> doc/comment, I don't know of anything that makes a distinction between
>>>>>>>>> schema description and table description, so I think it should work 
>>>>>>>>> for
>>>>>>>>> your use.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> With the growing trend of using LLMs to automatically generate
>>>>>>>>> SQL, it feels increasingly important to manage descriptions of 
>>>>>>>>> database
>>>>>>>>> tables and columns in a way that these tools can easily access.
>>>>>>>>>
>>>>>>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>>>>>>> columns) can be specified using the `doc` property within the `fields`
>>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>>>>>>> specify a comment for the root struct type itself - that is, for the 
>>>>>>>>> table
>>>>>>>>> as a whole.
>>>>>>>>>
>>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level
>>>>>>>>> comments by storing them in the `properties` map within the table 
>>>>>>>>> metadata
>>>>>>>>> under various non-standard keys. But since a table comment 
>>>>>>>>> conceptually
>>>>>>>>> belongs to the schema, and can vary by schema, it feels like the
>>>>>>>>> `properties` map within the table metadata might not be the best 
>>>>>>>>> place for
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> Would it make sense to allow a `doc` property on the `schema`
>>>>>>>>> object (the root struct type), alongside `schema-id` and
>>>>>>>>> `identifier-field-ids`, so that a description for the schema itself 
>>>>>>>>> can be
>>>>>>>>> included?
>>>>>>>>> It seems like it would be helpful, especially for tooling and
>>>>>>>>> LLM-related use cases.
>>>>>>>>>
>>>>>>>>> Curious to hear your thoughts.
>>>>>>>>> Apologies if I’m overlooking something or if this has already been
>>>>>>>>> discussed.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Taeyun
>>>>>>>>
>>>>>>>>

Reply via email to