dhallam opened a new issue, #3032:
URL: https://github.com/apache/iceberg-python/issues/3032
### Question
## Versions:
pyiceberg == 0.11.0
Glue 5.1
Spark 3.5.6
Python 3.13
Amazon S3 (not S3Tables)
## Question:
I have an iceberg table that is partitioned as
```
partitions=[
Partition("meta_received", DayTransform(), "received_day"),
Partition("meta_id", BucketTransform(5), "id_bucket"),
Partition("meta_digest", BucketTransform(5), "digest_bucket"),
]
```
When writing to iceberg, if there is a failed commit, multiple parquet files
are left in S3 under prefixes like
```
my_table/data/received_day=2026-01-22/id_bucket=2/digest_bucket=2/00000-19-9e0ba586-49c3-4f1e-b273-3db0c7fd0bda-0-00002.parquet
```
I want to add these files to the table to incorporate the data. I'd rather
add the files in place to remove the need to move them, load them and insert
them "manually" into the table.
Using
```
with table.transaction() as tx:
tx.add_files(
[
"s3://my_bucket/my_table/data/received_day=2026-01-22/id_bucket=2/digest_bucket=2/00000-19-9e0ba586-49c3-4f1e-b273-3db0c7fd0bda-0-00002.parquet",
...
],
check_duplicate_files=True,
)
```
I get
```
Cannot infer partition value from parquet metadata for a non-linear
Partition Field: id_bucket with transform bucket[5]
```
because `BucketTransform`'s `preserve_order` is `False`.
The partition info is present in the prefix.
Is there a way (or what is the best way) to commit these parquet files into
the table?
Many thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]