You're right that this would require a table commit, but that's the case
for almost all other parts of table metadata, including if we were to add a
doc field to schemas. We could handle this entirely at the catalog level,
but then it would be difficult to pass the data to engines to display.

That said, there is other catalog metadata, like `owner`, that we don't
track in the table and don't recommend using a table property for, so
there's room to have additional catalog-tracked metadata fields passed to
REST clients.

On Fri, Feb 20, 2026 at 7:34 AM Kevin Liu <[email protected]> wrote:

> I've been thinking about this particular use case lately. One drawback of
> using the doc or comment property in the Iceberg table metadata is that
> updates fall on the table commit path;  meaning any update to a comment
> will trigger the creation of an additional table snapshot. I think this
> side effect is worth documenting.
>
> Another option for supporting this use case would be to leave it to the
> catalogs to co-locate "business metadata" with the table. I've raised a
> discussion with the Polaris community [1].
>
> Best,
> Kevin Liu
>
>
> [1] https://github.com/apache/polaris/issues/3222
>
> On Thu, Feb 19, 2026 at 1:45 AM Guy Yasoor via dev <[email protected]>
> wrote:
>
>> Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367
>>
>> On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote:
>>
>>> It seems that we have a consensus to standardize and document the
>>> "comment" table properties. It is useful to provide the semantic context
>>> that is super helpful to LLMs. This is also how popular engines like Spark
>>> and Trino store the `comment` string from "CREATE TABLE" DDL.
>>>
>>> Taeyu/Guy, let us know if you are interested in creating a PR for that.
>>>
>>> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote:
>>>
>>>> I think it's probably a good idea to add more implementation-specific
>>>> details to the spec, like the use of "comment" for table documentation. We
>>>> recently added a section for this that is clear that these are not required
>>>> but are important conventions.
>>>>
>>>> I would not add "owner" to that section. Storing owner in table
>>>> properties is not a good idea because it would either need to be controlled
>>>> and overridden by catalogs or would be informational and untrustworthy. I
>>>> think that owner is part of catalog metadata, not table metadata.
>>>>
>>>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]>
>>>> wrote:
>>>>
>>>>> Got it - I now understand better the meaning of "reserved table
>>>>> properties", and I agree it shouldn't be touched or expanded.
>>>>>
>>>>> Going back to the original topic:
>>>>> It appears that both `comment` and `owner` are important fields, which
>>>>> are populated by some engines, and can prove useful for others, but aren't
>>>>> standardized anywhere in the spec.
>>>>> To improve engine alignment, I think they should be documented
>>>>> somewhere.
>>>>> I'd suggest one of two approaches:
>>>>>
>>>>>    1. Either keeping them in the table properties map, and
>>>>>    documenting it in the Table Properties documentation
>>>>>    
>>>>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties> 
>>>>> (but
>>>>>    not in the reserved section - perhaps it deserves its own section, 
>>>>> "Table
>>>>>    context properties"?)
>>>>>    2. Or adding them as optional top-level fields in the
>>>>>    metadata.json schema - this might be the "best practice" (especially if
>>>>>    `owner` is supposed to be controlled by the catalog). However, it will
>>>>>    require changing the current behavior of Spark, both for `owner`
>>>>>    assignment, and for `comment` assignment in "CREATE TABLE ... COMMENT
>>>>>    'table documentation'".
>>>>>
>>>>> WDYT?
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote:
>>>>>
>>>>>> The `format-version` table property is different because it is mapped
>>>>>> to the format version that is not stored in table properties. It is
>>>>>> reserved because implementations will override it and so it isn't a real
>>>>>> table property. This is not a pattern that we want to expand because of 
>>>>>> the
>>>>>> strange behavior.
>>>>>>
>>>>>> For cases like `comment`, these other properties are normal table
>>>>>> properties that can be used like any other. If the schema had a doc 
>>>>>> string
>>>>>> and that was used in place of `comment`, then I think it would be a
>>>>>> reserved property. But there's no need for that because setting the
>>>>>> property or using `COMMENT ON` would have the same behavior -- changing 
>>>>>> the
>>>>>> property value.
>>>>>>
>>>>>> The `owner` property is a different case. Owner is something that
>>>>>> should be restricted. A user should not be able to change it with just
>>>>>> access to modify table metadata. Tracking a table's owner is the
>>>>>> responsibility of the catalog and its access control scheme. Because of
>>>>>> this, I don't think that we should standardize or encourage setting an
>>>>>> `owner` table property.
>>>>>>
>>>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> If using "comment" is the best practice, should we add this to the 
>>>>>>> "reserved
>>>>>>> table properties" docs
>>>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>>>>>> to make sure it's aligned between different engines and implementations?
>>>>>>> In the same opportunity, I would suggest adding "owner" as
>>>>>>> well, which is automatically added by Spark.
>>>>>>>
>>>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I see, thank you for your response.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Taeyun
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>> To: <[email protected]>;
>>>>>>>> Cc:
>>>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>>>> Objects
>>>>>>>>
>>>>>>>>
>>>>>>>> If there isn't a significant difference between table-level
>>>>>>>> description and schema-level description, then I think you should 
>>>>>>>> consider
>>>>>>>> it standardized. You can store the table description in the "comment" 
>>>>>>>> table
>>>>>>>> property.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <
>>>>>>>> [email protected]> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I’ve already explained my reasoning in earlier messages, including
>>>>>>>> the example about making table and column descriptions more accessible 
>>>>>>>> for
>>>>>>>> LLM‑generated SQL.
>>>>>>>> From my perspective, table‑level comments, like column‑level
>>>>>>>> comments, should also be standardized.
>>>>>>>> If standardized, it seems natural for them to be part of the schema
>>>>>>>> definition, just like column‑level comments.
>>>>>>>> This way, they stay consistent with the schema version and avoid
>>>>>>>> drifting out of sync when the schema changes.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Taeyun
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>> To: <[email protected]>;
>>>>>>>> Cc:
>>>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>
>>>>>>>>
>>>>>>>> Why would you need to version table descriptions? Are there cases
>>>>>>>> where they are changing rapidly and inaccurate due to schema changes?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> Thank you for your reply.
>>>>>>>>
>>>>>>>> Column-level comments are already part of the schema definition.
>>>>>>>> Would adding just one table-level comment really cause noticeable 
>>>>>>>> bloat?
>>>>>>>> For example, if a table has 20 columns, adding one more comment would 
>>>>>>>> only
>>>>>>>> increase the metadata size by about 1/20th.
>>>>>>>>
>>>>>>>> Also, using schema-id as part of the property key feels like a
>>>>>>>> workaround rather than a proper solution. It is not part of the
>>>>>>>> specification, so any tool or integration (including LLM-based ones) 
>>>>>>>> would
>>>>>>>> need extra logic to interpret it. A standardized, schema-level field 
>>>>>>>> would
>>>>>>>> avoid that complexity and make the metadata easier to consume 
>>>>>>>> consistently.
>>>>>>>>
>>>>>>>> If bloat is a real concern, perhaps column-level comments should
>>>>>>>> also be moved out of the schema, with a proper mechanism to version and
>>>>>>>> manage them separately.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Taeyun.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "Gang Wu" <[email protected]>
>>>>>>>> To: <[email protected]>;
>>>>>>>> Cc:
>>>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>
>>>>>>>>
>>>>>>>> I'd rather not complicate the schema definitions in the table
>>>>>>>> metadata. You may append `schema-id` to the key of table property to 
>>>>>>>> manage
>>>>>>>> different schema versions.
>>>>>>>>
>>>>>>>>
>>>>>>>> Storing verbose text to each field may bloat the metadata storage,
>>>>>>>> especially when there are a lot of duplicate `doc`s if schema evolution
>>>>>>>> happens a lot.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Gang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> Thank you for your response.
>>>>>>>> As I understand it, the table description is currently stored as a
>>>>>>>> table property within the table metadata’s `properties` map.
>>>>>>>>
>>>>>>>> In my opinion, this approach has a few issues:
>>>>>>>>
>>>>>>>> - Table metadata `properties` are not versioned. As a result, when
>>>>>>>> querying an older snapshot, the description may be inaccurate because 
>>>>>>>> the
>>>>>>>> value reflects only the current state.
>>>>>>>> - According to the specification, the purpose of table metadata
>>>>>>>> properties is: “A string to string map of table properties. This is 
>>>>>>>> used to
>>>>>>>> control settings that affect reading and writing and is not intended 
>>>>>>>> to be
>>>>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall 
>>>>>>>> under
>>>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>>>>>> properties.
>>>>>>>> - Table comments seem to have become significant enough that
>>>>>>>> relying on a convention alone may no longer be sufficient. It might be
>>>>>>>> worth considering a standardized, schema-level field for them.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Taeyun
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>>>> To: <[email protected]>;
>>>>>>>> Cc:
>>>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>>>
>>>>>>>>
>>>>>>>> Iceberg does allow you to store table descriptions. The convention
>>>>>>>> is to use a table property, "comment". While this isn't a schema-level
>>>>>>>> doc/comment, I don't know of anything that makes a distinction between
>>>>>>>> schema description and table description, so I think it should work for
>>>>>>>> your use.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> With the growing trend of using LLMs to automatically generate SQL,
>>>>>>>> it feels increasingly important to manage descriptions of database 
>>>>>>>> tables
>>>>>>>> and columns in a way that these tools can easily access.
>>>>>>>>
>>>>>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>>>>>> columns) can be specified using the `doc` property within the `fields`
>>>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>>>>>> specify a comment for the root struct type itself - that is, for the 
>>>>>>>> table
>>>>>>>> as a whole.
>>>>>>>>
>>>>>>>> From what I can tell, OLAP DBMSs today may handle table-level
>>>>>>>> comments by storing them in the `properties` map within the table 
>>>>>>>> metadata
>>>>>>>> under various non-standard keys. But since a table comment conceptually
>>>>>>>> belongs to the schema, and can vary by schema, it feels like the
>>>>>>>> `properties` map within the table metadata might not be the best place 
>>>>>>>> for
>>>>>>>> it.
>>>>>>>>
>>>>>>>> Would it make sense to allow a `doc` property on the `schema`
>>>>>>>> object (the root struct type), alongside `schema-id` and
>>>>>>>> `identifier-field-ids`, so that a description for the schema itself 
>>>>>>>> can be
>>>>>>>> included?
>>>>>>>> It seems like it would be helpful, especially for tooling and
>>>>>>>> LLM-related use cases.
>>>>>>>>
>>>>>>>> Curious to hear your thoughts.
>>>>>>>> Apologies if I’m overlooking something or if this has already been
>>>>>>>> discussed.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Taeyun
>>>>>>>
>>>>>>>

Reply via email to