greenlaw opened a new issue, #2795: URL: https://github.com/apache/iceberg-python/issues/2795
### Apache Iceberg version 0.10.0 (latest release) ### Please describe the bug 🐞 When using a small value for `write.target-file-size-bytes`, appending any records larger than this value result in the following exception: ``` File "/var/lang/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 485, in append data_files = list( ^^^^^ File "/var/lang/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 2774, in _dataframe_to_data_files for batches in bin_pack_arrow_table(partition.arrow_table_partition, target_file_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lang/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 2588, in bin_pack_arrow_table batches = tbl.to_batches(max_chunksize=target_rows_per_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/table.pxi", line 5099, in pyarrow.lib.Table.to_batches ValueError: 'max_chunksize' should be strictly positive ``` The offending lines appear to be here: https://github.com/apache/iceberg-python/blob/abae20f89fd88d761317267d7ceeb226e84a78ca/pyiceberg/io/pyarrow.py#L2680-L2693 `target_rows_per_file = target_file_size // avg_row_size_bytes` could probably be changed to `target_rows_per_file = max(1, target_file_size // avg_row_size_bytes)` to ensure at least 1 row always gets written per chunk. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
