greenlaw opened a new issue, #2795:
URL: https://github.com/apache/iceberg-python/issues/2795

   ### Apache Iceberg version
   
   0.10.0 (latest release)
   
   ### Please describe the bug 🐞
   
   When using a small value for `write.target-file-size-bytes`, appending any 
records larger than this value result in the following exception:
   
   ```
   File "/var/lang/lib/python3.12/site-packages/pyiceberg/table/__init__.py", 
line 485, in append
   data_files = list(
   ^^^^^
   File "/var/lang/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 
2774, in _dataframe_to_data_files
   for batches in bin_pack_arrow_table(partition.arrow_table_partition, 
target_file_size)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/var/lang/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 
2588, in bin_pack_arrow_table
   batches = tbl.to_batches(max_chunksize=target_rows_per_file)
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "pyarrow/table.pxi", line 5099, in pyarrow.lib.Table.to_batches
   ValueError: 'max_chunksize' should be strictly positive
   ```
   
   The offending lines appear to be here: 
https://github.com/apache/iceberg-python/blob/abae20f89fd88d761317267d7ceeb226e84a78ca/pyiceberg/io/pyarrow.py#L2680-L2693
   
   `target_rows_per_file = target_file_size // avg_row_size_bytes` could 
probably be changed to `target_rows_per_file = max(1, target_file_size // 
avg_row_size_bytes)` to ensure at least 1 row always gets written per chunk.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to