Sure - I opened a PR here: https://github.com/apache/iceberg/pull/15367
On Thu, Feb 19, 2026 at 7:02 AM Steven Wu <[email protected]> wrote: > It seems that we have a consensus to standardize and document the > "comment" table properties. It is useful to provide the semantic context > that is super helpful to LLMs. This is also how popular engines like Spark > and Trino store the `comment` string from "CREATE TABLE" DDL. > > Taeyu/Guy, let us know if you are interested in creating a PR for that. > > On Thu, Aug 7, 2025 at 12:08 PM Ryan Blue <[email protected]> wrote: > >> I think it's probably a good idea to add more implementation-specific >> details to the spec, like the use of "comment" for table documentation. We >> recently added a section for this that is clear that these are not required >> but are important conventions. >> >> I would not add "owner" to that section. Storing owner in table >> properties is not a good idea because it would either need to be controlled >> and overridden by catalogs or would be informational and untrustworthy. I >> think that owner is part of catalog metadata, not table metadata. >> >> On Thu, Aug 7, 2025 at 9:38 AM Guy Yasoor <[email protected]> >> wrote: >> >>> Got it - I now understand better the meaning of "reserved table >>> properties", and I agree it shouldn't be touched or expanded. >>> >>> Going back to the original topic: >>> It appears that both `comment` and `owner` are important fields, which >>> are populated by some engines, and can prove useful for others, but aren't >>> standardized anywhere in the spec. >>> To improve engine alignment, I think they should be documented >>> somewhere. >>> I'd suggest one of two approaches: >>> >>> 1. Either keeping them in the table properties map, and documenting >>> it in the Table Properties documentation >>> <https://iceberg.apache.org/docs/latest/configuration/#table-properties> >>> (but >>> not in the reserved section - perhaps it deserves its own section, "Table >>> context properties"?) >>> 2. Or adding them as optional top-level fields in the metadata.json >>> schema - this might be the "best practice" (especially if `owner` is >>> supposed to be controlled by the catalog). However, it will require >>> changing the current behavior of Spark, both for `owner` assignment, and >>> for `comment` assignment in "CREATE TABLE ... COMMENT 'table >>> documentation'". >>> >>> WDYT? >>> >>> >>> On Tue, Aug 5, 2025 at 8:08 PM Ryan Blue <[email protected]> wrote: >>> >>>> The `format-version` table property is different because it is mapped >>>> to the format version that is not stored in table properties. It is >>>> reserved because implementations will override it and so it isn't a real >>>> table property. This is not a pattern that we want to expand because of the >>>> strange behavior. >>>> >>>> For cases like `comment`, these other properties are normal table >>>> properties that can be used like any other. If the schema had a doc string >>>> and that was used in place of `comment`, then I think it would be a >>>> reserved property. But there's no need for that because setting the >>>> property or using `COMMENT ON` would have the same behavior -- changing the >>>> property value. >>>> >>>> The `owner` property is a different case. Owner is something that >>>> should be restricted. A user should not be able to change it with just >>>> access to modify table metadata. Tracking a table's owner is the >>>> responsibility of the catalog and its access control scheme. Because of >>>> this, I don't think that we should standardize or encourage setting an >>>> `owner` table property. >>>> >>>> On Tue, Aug 5, 2025 at 4:21 AM Guy Yasoor <[email protected]> >>>> wrote: >>>> >>>>> If using "comment" is the best practice, should we add this to the >>>>> "reserved >>>>> table properties" docs >>>>> <https://iceberg.apache.org/docs/latest/configuration/#reserved-table-properties>, >>>>> to make sure it's aligned between different engines and implementations? >>>>> In the same opportunity, I would suggest adding "owner" as well, which >>>>> is automatically added by Spark. >>>>> >>>>> On Tue, Aug 5, 2025 at 2:16 AM Taeyun Kim <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I see, thank you for your response. >>>>>> >>>>>> Best regards, >>>>>> Taeyun >>>>>> >>>>>> -----Original Message----- >>>>>> From: "Ryan Blue" <[email protected]> >>>>>> To: <[email protected]>; >>>>>> Cc: >>>>>> Sent: 2025-08-05 (화) 07:45:43 (UTC+09:00) >>>>>> Subject: Re: Re: Thoughts on Adding a `doc` Property for Schema >>>>>> Objects >>>>>> >>>>>> >>>>>> If there isn't a significant difference between table-level >>>>>> description and schema-level description, then I think you should >>>>>> consider >>>>>> it standardized. You can store the table description in the "comment" >>>>>> table >>>>>> property. >>>>>> >>>>>> >>>>>> On Sun, Aug 3, 2025 at 5:28 PM Taeyun Kim < >>>>>> [email protected]> wrote: >>>>>> Hi, >>>>>> >>>>>> I’ve already explained my reasoning in earlier messages, including >>>>>> the example about making table and column descriptions more accessible >>>>>> for >>>>>> LLM‑generated SQL. >>>>>> From my perspective, table‑level comments, like column‑level >>>>>> comments, should also be standardized. >>>>>> If standardized, it seems natural for them to be part of the schema >>>>>> definition, just like column‑level comments. >>>>>> This way, they stay consistent with the schema version and avoid >>>>>> drifting out of sync when the schema changes. >>>>>> >>>>>> Thanks, >>>>>> Taeyun >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: "Ryan Blue" <[email protected]> >>>>>> To: <[email protected]>; >>>>>> Cc: >>>>>> Sent: 2025-07-26 (토) 08:05:55 (UTC+09:00) >>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>>>> >>>>>> >>>>>> Why would you need to version table descriptions? Are there cases >>>>>> where they are changing rapidly and inaccurate due to schema changes? >>>>>> >>>>>> >>>>>> On Thu, Jul 24, 2025 at 7:48 PM Taeyun Kim < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Thank you for your reply. >>>>>> >>>>>> Column-level comments are already part of the schema definition. >>>>>> Would adding just one table-level comment really cause noticeable bloat? >>>>>> For example, if a table has 20 columns, adding one more comment would >>>>>> only >>>>>> increase the metadata size by about 1/20th. >>>>>> >>>>>> Also, using schema-id as part of the property key feels like a >>>>>> workaround rather than a proper solution. It is not part of the >>>>>> specification, so any tool or integration (including LLM-based ones) >>>>>> would >>>>>> need extra logic to interpret it. A standardized, schema-level field >>>>>> would >>>>>> avoid that complexity and make the metadata easier to consume >>>>>> consistently. >>>>>> >>>>>> If bloat is a real concern, perhaps column-level comments should also >>>>>> be moved out of the schema, with a proper mechanism to version and manage >>>>>> them separately. >>>>>> >>>>>> Thank you, >>>>>> Taeyun. >>>>>> >>>>>> -----Original Message----- >>>>>> From: "Gang Wu" <[email protected]> >>>>>> To: <[email protected]>; >>>>>> Cc: >>>>>> Sent: 2025-07-25 (금) 11:20:08 (UTC+09:00) >>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>>>> >>>>>> >>>>>> I'd rather not complicate the schema definitions in the table >>>>>> metadata. You may append `schema-id` to the key of table property to >>>>>> manage >>>>>> different schema versions. >>>>>> >>>>>> >>>>>> Storing verbose text to each field may bloat the metadata storage, >>>>>> especially when there are a lot of duplicate `doc`s if schema evolution >>>>>> happens a lot. >>>>>> >>>>>> >>>>>> Best, >>>>>> Gang >>>>>> >>>>>> >>>>>> On Fri, Jul 25, 2025 at 9:25 AM Taeyun Kim < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Thank you for your response. >>>>>> As I understand it, the table description is currently stored as a >>>>>> table property within the table metadata’s `properties` map. >>>>>> >>>>>> In my opinion, this approach has a few issues: >>>>>> >>>>>> - Table metadata `properties` are not versioned. As a result, when >>>>>> querying an older snapshot, the description may be inaccurate because the >>>>>> value reflects only the current state. >>>>>> - According to the specification, the purpose of table metadata >>>>>> properties is: “A string to string map of table properties. This is used >>>>>> to >>>>>> control settings that affect reading and writing and is not intended to >>>>>> be >>>>>> used for arbitrary metadata.” Based on this, a comment seems to fall >>>>>> under >>>>>> “arbitrary metadata,” and therefore may not be an appropriate use of >>>>>> properties. >>>>>> - Table comments seem to have become significant enough that relying >>>>>> on a convention alone may no longer be sufficient. It might be worth >>>>>> considering a standardized, schema-level field for them. >>>>>> >>>>>> Thank you. >>>>>> Taeyun >>>>>> >>>>>> -----Original Message----- >>>>>> From: "Ryan Blue" <[email protected]> >>>>>> To: <[email protected]>; >>>>>> Cc: >>>>>> Sent: 2025-07-25 (금) 08:48:48 (UTC+09:00) >>>>>> Subject: Re: Thoughts on Adding a `doc` Property for Schema Objects >>>>>> >>>>>> >>>>>> Iceberg does allow you to store table descriptions. The convention is >>>>>> to use a table property, "comment". While this isn't a schema-level >>>>>> doc/comment, I don't know of anything that makes a distinction between >>>>>> schema description and table description, so I think it should work for >>>>>> your use. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 22, 2025 at 5:48 PM 김태연 (Taeyun Kim) < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> With the growing trend of using LLMs to automatically generate SQL, >>>>>> it feels increasingly important to manage descriptions of database tables >>>>>> and columns in a way that these tools can easily access. >>>>>> >>>>>> In the Iceberg specification, comments for schema fields (i.e., >>>>>> columns) can be specified using the `doc` property within the `fields` >>>>>> array of a `struct` type. However, there doesn’t seem to be a way to >>>>>> specify a comment for the root struct type itself - that is, for the >>>>>> table >>>>>> as a whole. >>>>>> >>>>>> From what I can tell, OLAP DBMSs today may handle table-level >>>>>> comments by storing them in the `properties` map within the table >>>>>> metadata >>>>>> under various non-standard keys. But since a table comment conceptually >>>>>> belongs to the schema, and can vary by schema, it feels like the >>>>>> `properties` map within the table metadata might not be the best place >>>>>> for >>>>>> it. >>>>>> >>>>>> Would it make sense to allow a `doc` property on the `schema` object >>>>>> (the root struct type), alongside `schema-id` and `identifier-field-ids`, >>>>>> so that a description for the schema itself can be included? >>>>>> It seems like it would be helpful, especially for tooling and >>>>>> LLM-related use cases. >>>>>> >>>>>> Curious to hear your thoughts. >>>>>> Apologies if I’m overlooking something or if this has already been >>>>>> discussed. >>>>>> >>>>>> Thank you, >>>>>> Taeyun >>>>> >>>>>
