[I] How to add_files for BucketTransform partitioned data? [iceberg-python]

via GitHub Tue, 10 Feb 2026 07:29:54 -0800


dhallam opened a new issue, #3032:
URL: https://github.com/apache/iceberg-python/issues/3032


   ### Question
   
   ## Versions:
   
   pyiceberg == 0.11.0
   Glue 5.1
   Spark 3.5.6
   Python 3.13
   Amazon S3 (not S3Tables)
   
   ## Question:
   
   I have an iceberg table that is partitioned as
   ```
   partitions=[  
       Partition("meta_received", DayTransform(), "received_day"),  
       Partition("meta_id", BucketTransform(5), "id_bucket"),  
       Partition("meta_digest", BucketTransform(5), "digest_bucket"),  
   ]
   ```
   
   When writing to iceberg, if there is a failed commit, multiple parquet files 
are left in S3 under prefixes like 
   ```
   
my_table/data/received_day=2026-01-22/id_bucket=2/digest_bucket=2/00000-19-9e0ba586-49c3-4f1e-b273-3db0c7fd0bda-0-00002.parquet
   ```
   
   I want to add these files to the table to incorporate the data. I'd rather 
add the files in place to remove the need to move them, load them and insert 
them "manually" into the table.
   
   Using
   ```
   with table.transaction() as tx:  
       tx.add_files(  
           [            
                
"s3://my_bucket/my_table/data/received_day=2026-01-22/id_bucket=2/digest_bucket=2/00000-19-9e0ba586-49c3-4f1e-b273-3db0c7fd0bda-0-00002.parquet",
                ...
           ],  
           check_duplicate_files=True,  
       )
   ```
   
   I get 
   ```
   Cannot infer partition value from parquet metadata for a non-linear 
Partition Field: id_bucket with transform bucket[5]
   ```
   because `BucketTransform`'s `preserve_order` is `False`.
   
   The partition info is present in the prefix.
   
   Is there a way (or what is the best way) to commit these parquet files into 
the table?
   
   Many thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] How to add_files for BucketTransform partitioned data? [iceberg-python]

Reply via email to