lidavidm commented on issue #11649:
URL: https://github.com/apache/arrow/issues/11649#issuecomment-964237889


   Flight itself does not implement any readahead. However, that's fine, as 
gRPC itself effectively does this by buffering data. See this [StackOverflow 
question](https://stackoverflow.com/questions/64828787/stream-window-size-with-regards-to-http-2-and-stream-performance),
 for instance.
   
   You can observe this with a short demo:
   
   <details>
   <summary>Code</summary>
   
   server.py:
   ```python
   import time
   
   import numpy as np
   import pyarrow as pa
   import pyarrow.flight as flight
   
   
   class ExampleServer(flight.FlightServerBase):
       def do_get(self, context, ticket):
           num_batches = 5
           delay = 2
           if ticket.ticket:
               num_batches = int(ticket.ticket)
           print("Serving", num_batches, "batches at a", delay, "second 
interval")
           batch = pa.RecordBatch.from_arrays([
               np.random.rand(4096)
               for _ in range(4)
           ], names=["a", "b", "c", "d"])
   
           def _gen():
               for i in range(num_batches):
                   time.sleep(delay)
                   print("Sending batch at:", time.time())
                   yield batch
   
           return flight.GeneratorStream(batch.schema, _gen())
   
   
   def main():
       print("PyArrow:", pa.__version__)
       server = ExampleServer("grpc+tcp://localhost:3000")
       print("Serving on port", server.port)
       server.serve()
   
   
   if __name__ == "__main__":
       main()
   ```
   
   client.py:
   ```python
   import time
   
   import pyarrow as pa
   import pyarrow.flight as flight
   
   
   def main():
       client = flight.connect("grpc+tcp://localhost:3000")
       stream = client.do_get(flight.Ticket(b""))
       # Do some other work...
       print("Doing some work for 10 seconds at:", time.time())
       time.sleep(10)
       for _ in range(5):
           stream.read_chunk()
           print("Got batch at:", time.time())
   
   
   if __name__ == "__main__":
       main()
   ```
   
   </details>
   
   <details>
   <summary>Results</summary>
   
   ```
   # Server side
   > python server.py 
   PyArrow: 5.0.0
   Serving on port 3000
   Serving 5 batches at a 2 second interval
   Sending batch at: 1636470268.1261907
   Sending batch at: 1636470270.1284654
   Sending batch at: 1636470272.1303897
   Sending batch at: 1636470274.1331174
   Sending batch at: 1636470276.1359076
   
   # Client side
   > python client.py
   Doing some work for 10 seconds at: 1636470266.1256127
   Got batch at: 1636470276.1363451
   Got batch at: 1636470276.1364918
   Got batch at: 1636470276.1365817
   Got batch at: 1636470276.1366274
   Got batch at: 1636470276.1369655
   ```
   
   </details>
   
   Observe how from the client's perspective, it got all the batches at once.
   
   Is this what you were looking for?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to