vigneshsiva11 commented on issue #7973: URL: https://github.com/apache/arrow-rs/issues/7973#issuecomment-3853537062
Thanks @alamb for the confirmation! I’ll start by adding regression tests that reproduce the overflow for large string/binary columns, and I’ll include coverage for the different Parquet string encodings (PLAIN, DELTA_LENGTH_BYTE_ARRAY, DELTA_BYTE_ARRAY, and RLE_DICTIONARY). Once the tests are in place, I’ll move on to exploring a minimal batching change at the Parquet reader layer that avoids offset overflows without impacting the common path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
