[GitHub] [arrow] maartenbreddels commented on pull request #8755: ARROW-10709: [Python] Allow PythonFile.read() to always return a buffer

GitBox Tue, 24 Nov 2020 06:45:23 -0800


maartenbreddels commented on pull request #8755:
URL: https://github.com/apache/arrow/pull/8755#issuecomment-733019321



   No, but if I call PythonFile.reader_buffer() from Python, it will call 
`Result<std::shared_ptr<Buffer>> Read(int64_t nbytes)`, which will call 
PyReadableFile.read(), and will do a zero memory copy if it returns a 
memoryview/buffer, all happy.
   
   But if I now pass this file object to say the parquet reader, it will (via a 
path unknown to me) call PyReadableFile.read() as well, but via 
`Result<int64_t> Read(int64_t nbytes, void* out)` which will fail because it 
will only accept a bytes object as return value of PyReadableFile.read.
   
   So, this PR enables the use of returning a memoryview from 
PyReadableFile.read, and that will work within all of `pyarrow`. This file 
object will however, not work with pandas (who also expects a bytes object 
AFAICT).
   
   So the other solution would be to call `PyReadableFile.read_buffer`, when 
implemented from `Result<std::shared_ptr<Buffer>> Read(int64_t nbytes)`.
   
   Does that make sense (took me a while to wrap my head about it).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] maartenbreddels commented on pull request #8755: ARROW-10709: [Python] Allow PythonFile.read() to always return a buffer

Reply via email to