calluw commented on issue #39444: URL: https://github.com/apache/arrow/issues/39444#issuecomment-1888757088
I can confirm that reducing the batch size via PyArrow (`dataset.to_table(batch_size=1000)` in my reproduction above) does prevent the error, with a dataset of size `2^15 + 1` rows. In fact, specifically any batch size up to and including 2048 (`2^11`) works, but `2^11 + 1` hits the error! This was true also for `2^16 + 1` rows, however at `2^17 + 1` rows and above, no batch size was sufficiently small to not trigger the error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
