emkornfield commented on a change in pull request #4021: URL: https://github.com/apache/iceberg/pull/4021#discussion_r798181988
########## File path: python/src/iceberg/io/base.py ########## @@ -24,7 +24,40 @@ """ from abc import ABC, abstractmethod -from typing import Union +from typing import Protocol, Union, runtime_checkable + + +@runtime_checkable +class InputStream(Protocol): + def read(self, n: int) -> bytes: + ... Review comment: I don't think NativeFile is the right thing to return. NativeFile is arrow's file abstraction and has native C++ implementations for the most part. [PythonFile](https://arrow.apache.org/docs/python/generated/pyarrow.PythonFile.html?highlight=pythonfile#pyarrow.PythonFile) is the adapter from File-like python objects to Arrow's File format. I think IOBase probably guarantees this and looking at the [source](https://github.com/apache/arrow/blob/e9e16c9da7a76718640f2b3f23200a3755790011/python/pyarrow/io.pxi#L679) it seems like there are some isinstance checks for IOBase or duck-typed. What we probably want to do in whatever code passes these interfaces to Arrow is to check if they are an instance of a wrapper around NativeFile and pass the native files instead to avoid GIL and non-zero indirection costs as mentioned in the PythonFile docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org