maartenbreddels commented on pull request #8755: URL: https://github.com/apache/arrow/pull/8755#issuecomment-733019321
No, but if I call PythonFile.reader_buffer() from Python, it will call `Result<std::shared_ptr<Buffer>> Read(int64_t nbytes)`, which will call PyReadableFile.read(), and will do a zero memory copy if it returns a memoryview/buffer, all happy. But if I now pass this file object to say the parquet reader, it will (via a path unknown to me) call PyReadableFile.read() as well, but via `Result<int64_t> Read(int64_t nbytes, void* out)` which will fail because it will only accept a bytes object as return value of PyReadableFile.read. So, this PR enables the use of returning a memoryview from PyReadableFile.read, and that will work within all of `pyarrow`. This file object will however, not work with pandas (who also expects a bytes object AFAICT). So the other solution would be to call `PyReadableFile.read_buffer`, when implemented from `Result<std::shared_ptr<Buffer>> Read(int64_t nbytes)`. Does that make sense (took me a while to wrap my head about it). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
