calluw commented on issue #39444:
URL: https://github.com/apache/arrow/issues/39444#issuecomment-1888757088

   I can confirm that reducing the batch size via PyArrow 
(`dataset.to_table(batch_size=1000)` in my reproduction above) does prevent the 
error, with a dataset of size `2^15 + 1` rows. In fact, specifically any batch 
size up to and including 2048 (`2^11`) works, but `2^11 + 1` hits the error! 
This was true also for `2^16 + 1` rows, however at `2^17 + 1` rows and above, 
no batch size was sufficiently small to not trigger the error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to