Hello everyone,

I've been working on the AWS Glue catalog in iceberg-go and hit a concrete
problem: for tables with very large schemas (~3600 fields),
CreateTable/UpdateTable calls fail with Glue's API payload size limit
because all schema columns are unconditionally written to
StorageDescriptor.Columns on every operation.

This has come up in the Java implementation before — issue #7584 was opened
in May 2023, followed by two PRs (#11334 and #12664) both proposing a
`glue.non-current-fields-disabled` property. All three were closed due to
inactivity without reaching consensus.

I checked all implementations — Java, Python, Go, and Rust — none expose a
property to skip columns in the StorageDescriptor today.

I propose adding `glue.schema-columns` (default: `true`) across all
implementations, following the same convention as existing `glue.*`
properties [1]. When set to `false`, columns are omitted from the
StorageDescriptor [2]. Since StorageDescriptor.Columns is used by Lake
Formation for column-level access control [3], the catalog should fail fast
when `glue.lakeformation-enabled=true` is set alongside it.

A Go implementation with the LF guard is at [4].

A few things I'd like to align on:
- Is `glue.schema-columns` the right name, or should we revive the naming
from [5] and [6]?
- Should this be a coordinated cross-implementation effort, or is it fine
to merge per-implementation under the `glue.*` namespace independently?

[1]:
https://github.com/apache/iceberg/blob/ed8a16bbeb549b0286d3c229beb5a0cf165f2f4b/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java#L70
[2]:
https://docs.aws.amazon.com/glue/latest/webapi/API_StorageDescriptor.html
[3]: https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html
[4]: https://github.com/apache/iceberg-go/pull/769
[5]: https://github.com/apache/iceberg/pull/11334
[6]: https://github.com/apache/iceberg/pull/12664

Andrei

Reply via email to