lidavidm commented on issue #11649: URL: https://github.com/apache/arrow/issues/11649#issuecomment-964237889
Flight itself does not implement any readahead. However, that's fine, as gRPC itself effectively does this by buffering data. See this [StackOverflow question](https://stackoverflow.com/questions/64828787/stream-window-size-with-regards-to-http-2-and-stream-performance), for instance. You can observe this with a short demo: <details> <summary>Code</summary> server.py: ```python import time import numpy as np import pyarrow as pa import pyarrow.flight as flight class ExampleServer(flight.FlightServerBase): def do_get(self, context, ticket): num_batches = 5 delay = 2 if ticket.ticket: num_batches = int(ticket.ticket) print("Serving", num_batches, "batches at a", delay, "second interval") batch = pa.RecordBatch.from_arrays([ np.random.rand(4096) for _ in range(4) ], names=["a", "b", "c", "d"]) def _gen(): for i in range(num_batches): time.sleep(delay) print("Sending batch at:", time.time()) yield batch return flight.GeneratorStream(batch.schema, _gen()) def main(): print("PyArrow:", pa.__version__) server = ExampleServer("grpc+tcp://localhost:3000") print("Serving on port", server.port) server.serve() if __name__ == "__main__": main() ``` client.py: ```python import time import pyarrow as pa import pyarrow.flight as flight def main(): client = flight.connect("grpc+tcp://localhost:3000") stream = client.do_get(flight.Ticket(b"")) # Do some other work... print("Doing some work for 10 seconds at:", time.time()) time.sleep(10) for _ in range(5): stream.read_chunk() print("Got batch at:", time.time()) if __name__ == "__main__": main() ``` </details> <details> <summary>Results</summary> ``` # Server side > python server.py PyArrow: 5.0.0 Serving on port 3000 Serving 5 batches at a 2 second interval Sending batch at: 1636470268.1261907 Sending batch at: 1636470270.1284654 Sending batch at: 1636470272.1303897 Sending batch at: 1636470274.1331174 Sending batch at: 1636470276.1359076 # Client side > python client.py Doing some work for 10 seconds at: 1636470266.1256127 Got batch at: 1636470276.1363451 Got batch at: 1636470276.1364918 Got batch at: 1636470276.1365817 Got batch at: 1636470276.1366274 Got batch at: 1636470276.1369655 ``` </details> Observe how from the client's perspective, it got all the batches at once. Is this what you were looking for? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
