Hello everyone, I've been working on the AWS Glue catalog in iceberg-go and hit a concrete problem: for tables with very large schemas (~3600 fields), CreateTable/UpdateTable calls fail with Glue's API payload size limit because all schema columns are unconditionally written to StorageDescriptor.Columns on every operation.
This has come up in the Java implementation before — issue #7584 was opened in May 2023, followed by two PRs (#11334 and #12664) both proposing a `glue.non-current-fields-disabled` property. All three were closed due to inactivity without reaching consensus. I checked all implementations — Java, Python, Go, and Rust — none expose a property to skip columns in the StorageDescriptor today. I propose adding `glue.schema-columns` (default: `true`) across all implementations, following the same convention as existing `glue.*` properties [1]. When set to `false`, columns are omitted from the StorageDescriptor [2]. Since StorageDescriptor.Columns is used by Lake Formation for column-level access control [3], the catalog should fail fast when `glue.lakeformation-enabled=true` is set alongside it. A Go implementation with the LF guard is at [4]. A few things I'd like to align on: - Is `glue.schema-columns` the right name, or should we revive the naming from [5] and [6]? - Should this be a coordinated cross-implementation effort, or is it fine to merge per-implementation under the `glue.*` namespace independently? [1]: https://github.com/apache/iceberg/blob/ed8a16bbeb549b0286d3c229beb5a0cf165f2f4b/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java#L70 [2]: https://docs.aws.amazon.com/glue/latest/webapi/API_StorageDescriptor.html [3]: https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html [4]: https://github.com/apache/iceberg-go/pull/769 [5]: https://github.com/apache/iceberg/pull/11334 [6]: https://github.com/apache/iceberg/pull/12664 Andrei
