Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367

On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote:

> It seems that we have a consensus to standardize and document the
> "comment" table properties. It is useful to provide the semantic context
> that is super helpful to LLMs. This is also how popular engines like Spark
> and Trino store the `comment` string from "CREATE TABLE" DDL.
>
> Taeyu/Guy, let us know if you are interested in creating a PR for that.
>
> On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote:
>
>> I think it's probably a good idea to add more implementation-specific
>> details to the spec, like the use of "comment" for table documentation. We
>> recently added a section for this that is clear that these are not required
>> but are important conventions.
>>
>> I would not add "owner" to that section. Storing owner in table
>> properties is not a good idea because it would either need to be controlled
>> and overridden by catalogs or would be informational and untrustworthy. I
>> think that owner is part of catalog metadata, not table metadata.
>>
>> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]>
>> wrote:
>>
>>> Got it - I now understand better the meaning of "reserved table
>>> properties", and I agree it shouldn't be touched or expanded.
>>>
>>> Going back to the original topic:
>>> It appears that both `comment` and `owner` are important fields, which
>>> are populated by some engines, and can prove useful for others, but aren't
>>> standardized anywhere in the spec.
>>> To improve engine alignment, I think they should be documented
>>> somewhere.
>>> I'd suggest one of two approaches:
>>>
>>>    1. Either keeping them in the table properties map, and documenting
>>>    it in the Table Properties documentation
>>>    <https://iceberg.apache.org/docs/latest/configuration/#table-properties> 
>>> (but
>>>    not in the reserved section - perhaps it deserves its own section, "Table
>>>    context properties"?)
>>>    2. Or adding them as optional top-level fields in the metadata.json
>>>    schema - this might be the "best practice" (especially if `owner` is
>>>    supposed to be controlled by the catalog). However, it will require
>>>    changing the current behavior of Spark, both for `owner` assignment, and
>>>    for `comment` assignment in "CREATE TABLE ... COMMENT 'table
>>>    documentation'".
>>>
>>> WDYT?
>>>
>>>
>>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote:
>>>
>>>> The `format-version` table property is different because it is mapped
>>>> to the format version that is not stored in table properties. It is
>>>> reserved because implementations will override it and so it isn't a real
>>>> table property. This is not a pattern that we want to expand because of the
>>>> strange behavior.
>>>>
>>>> For cases like `comment`, these other properties are normal table
>>>> properties that can be used like any other. If the schema had a doc string
>>>> and that was used in place of `comment`, then I think it would be a
>>>> reserved property. But there's no need for that because setting the
>>>> property or using `COMMENT ON` would have the same behavior -- changing the
>>>> property value.
>>>>
>>>> The `owner` property is a different case. Owner is something that
>>>> should be restricted. A user should not be able to change it with just
>>>> access to modify table metadata. Tracking a table's owner is the
>>>> responsibility of the catalog and its access control scheme. Because of
>>>> this, I don't think that we should standardize or encourage setting an
>>>> `owner` table property.
>>>>
>>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <[email protected]>
>>>> wrote:
>>>>
>>>>> If using "comment" is the best practice, should we add this to the 
>>>>> "reserved
>>>>> table properties" docs
>>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>,
>>>>> to make sure it's aligned between different engines and implementations?
>>>>> In the same opportunity, I would suggest adding "owner" as well, which
>>>>> is automatically added by Spark.
>>>>>
>>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I see, thank you for your response.
>>>>>>
>>>>>> Best regards,
>>>>>> Taeyun
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>> To: <[email protected]>;
>>>>>> Cc:
>>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00)
>>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema
>>>>>> Objects
>>>>>>
>>>>>>
>>>>>> If there isn't a significant difference between table-level
>>>>>> description and schema-level description, then I think you should 
>>>>>> consider
>>>>>> it standardized. You can store the table description in the "comment" 
>>>>>> table
>>>>>> property.
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim <
>>>>>> [email protected]> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I’ve already explained my reasoning in earlier messages, including
>>>>>> the example about making table and column descriptions more accessible 
>>>>>> for
>>>>>> LLM‑generated SQL.
>>>>>> From my perspective, table‑level comments, like column‑level
>>>>>> comments, should also be standardized.
>>>>>> If standardized, it seems natural for them to be part of the schema
>>>>>> definition, just like column‑level comments.
>>>>>> This way, they stay consistent with the schema version and avoid
>>>>>> drifting out of sync when the schema changes.
>>>>>>
>>>>>> Thanks,
>>>>>> Taeyun
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>> To: <[email protected]>;
>>>>>> Cc:
>>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00)
>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>
>>>>>>
>>>>>> Why would you need to version table descriptions? Are there cases
>>>>>> where they are changing rapidly and inaccurate due to schema changes?
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Thank you for your reply.
>>>>>>
>>>>>> Column-level comments are already part of the schema definition.
>>>>>> Would adding just one table-level comment really cause noticeable bloat?
>>>>>> For example, if a table has 20 columns, adding one more comment would 
>>>>>> only
>>>>>> increase the metadata size by about 1/20th.
>>>>>>
>>>>>> Also, using schema-id as part of the property key feels like a
>>>>>> workaround rather than a proper solution. It is not part of the
>>>>>> specification, so any tool or integration (including LLM-based ones) 
>>>>>> would
>>>>>> need extra logic to interpret it. A standardized, schema-level field 
>>>>>> would
>>>>>> avoid that complexity and make the metadata easier to consume 
>>>>>> consistently.
>>>>>>
>>>>>> If bloat is a real concern, perhaps column-level comments should also
>>>>>> be moved out of the schema, with a proper mechanism to version and manage
>>>>>> them separately.
>>>>>>
>>>>>> Thank you,
>>>>>> Taeyun.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "Gang Wu" <[email protected]>
>>>>>> To: <[email protected]>;
>>>>>> Cc:
>>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00)
>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>
>>>>>>
>>>>>> I'd rather not complicate the schema definitions in the table
>>>>>> metadata. You may append `schema-id` to the key of table property to 
>>>>>> manage
>>>>>> different schema versions.
>>>>>>
>>>>>>
>>>>>> Storing verbose text to each field may bloat the metadata storage,
>>>>>> especially when there are a lot of duplicate `doc`s if schema evolution
>>>>>> happens a lot.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Gang
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Thank you for your response.
>>>>>> As I understand it, the table description is currently stored as a
>>>>>> table property within the table metadata’s `properties` map.
>>>>>>
>>>>>> In my opinion, this approach has a few issues:
>>>>>>
>>>>>> - Table metadata `properties` are not versioned. As a result, when
>>>>>> querying an older snapshot, the description may be inaccurate because the
>>>>>> value reflects only the current state.
>>>>>> - According to the specification, the purpose of table metadata
>>>>>> properties is: “A string to string map of table properties. This is used 
>>>>>> to
>>>>>> control settings that affect reading and writing and is not intended to 
>>>>>> be
>>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall 
>>>>>> under
>>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of
>>>>>> properties.
>>>>>> - Table comments seem to have become significant enough that relying
>>>>>> on a convention alone may no longer be sufficient. It might be worth
>>>>>> considering a standardized, schema-level field for them.
>>>>>>
>>>>>> Thank you.
>>>>>> Taeyun
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: "Ryan Blue" <[email protected]>
>>>>>> To: <[email protected]>;
>>>>>> Cc:
>>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00)
>>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects
>>>>>>
>>>>>>
>>>>>> Iceberg does allow you to store table descriptions. The convention is
>>>>>> to use a table property, "comment". While this isn't a schema-level
>>>>>> doc/comment, I don't know of anything that makes a distinction between
>>>>>> schema description and table description, so I think it should work for
>>>>>> your use.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> With the growing trend of using LLMs to automatically generate SQL,
>>>>>> it feels increasingly important to manage descriptions of database tables
>>>>>> and columns in a way that these tools can easily access.
>>>>>>
>>>>>> In the Iceberg specification, comments for schema fields (i.e.,
>>>>>> columns) can be specified using the `doc` property within the `fields`
>>>>>> array of a `struct` type. However, there doesn’t seem to be a way to
>>>>>> specify a comment for the root struct type itself - that is, for the 
>>>>>> table
>>>>>> as a whole.
>>>>>>
>>>>>> From what I can tell, OLAP DBMSs today may handle table-level
>>>>>> comments by storing them in the `properties` map within the table 
>>>>>> metadata
>>>>>> under various non-standard keys. But since a table comment conceptually
>>>>>> belongs to the schema, and can vary by schema, it feels like the
>>>>>> `properties` map within the table metadata might not be the best place 
>>>>>> for
>>>>>> it.
>>>>>>
>>>>>> Would it make sense to allow a `doc` property on the `schema` object
>>>>>> (the root struct type), alongside `schema-id` and `identifier-field-ids`,
>>>>>> so that a description for the schema itself can be included?
>>>>>> It seems like it would be helpful, especially for tooling and
>>>>>> LLM-related use cases.
>>>>>>
>>>>>> Curious to hear your thoughts.
>>>>>> Apologies if I’m overlooking something or if this has already been
>>>>>> discussed.
>>>>>>
>>>>>> Thank you,
>>>>>> Taeyun
>>>>>
>>>>>

Reply via email to