felipecrv commented on issue #39682: URL: https://github.com/apache/arrow/issues/39682#issuecomment-1932870929
@mapleFU I understand the error is raised from the `BaseBinaryBuilder<T>` (superclass of the `StringBuilder`). The issue is not the inability to allocate more than 2GBs of RAM, the issue is that the `StringArray` can't address more than 2GBs of RAM from the offsets buffer (32-bit offsets). The Parquet reader should figure how to read these string into a `LargeStringArray`. That means writing 64-bit offsets into the data buffer of the resulting `LargeStringArray`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
