qed- opened a new issue, #39237:
URL: https://github.com/apache/arrow/issues/39237

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   We have been seeing occasional segfaults in a python arrow flight client 
(and sometimes server) when running with SSL enabled. We can reliably reproduce 
the problem with the minimal code and particular arrow stream payload below.
   
   Environment:
   - Microsoft Windows 10 Enterprise    10.0.19045 N/A Build 19045
   - Python 3.10.12
   - pyarrow==14.0.1
   - pandas==2.1.4
   
   The server.py code implements a server that will replay a particular arrow 
stream from the attached file 
[stream.arrow](https://github.com/apache/arrow/files/13684485/stream.zip) from 
do_get()  (data has been anonymised). client.py calls do_get() and reads into a 
pandas dataframe.  
   
   TLS certificates can be found at 
https://github.com/apache/arrow-testing/tree/master/data/flight
   
   After starting the two processes the problem should occur after a few 
minutes:
   ```
   (MinimalRepro) D:\FlightMinimalRepro>python client.py
   Windows fatal exception: access violation
   
   Current thread 0x0000795c (most recent call first):
     File "D:\FlightMinimalRepro\client.py", line 34 in <module>
   
   
   (MinimalRepro) D:\FlightMinimalRepro>python server.py
   INFO:__main__:Starting serve
   Windows fatal exception: access violation
   
   Thread 0x00008cf8 (most recent call first):
     File "D:\FlightMinimalRepro\server.py", line 39 in <module>
   ```
   
   The problem appears to be related to TLS (does not occur on non-TLS enabled 
connections) and timing related (does not occur without the call to sleep() 
between batches in the server).
   
   ### server.py
   ```python
   import faulthandler
   faulthandler.enable()
   
   from typing import Iterator
   import pyarrow as pa
   import pyarrow.flight
   from time import sleep
   import logging
   import sys
   
   log = logging.getLogger(__name__)
   
   class FlightServer(pa.flight.FlightServerBase):
       def do_get(self, context, ticket):
           source = pa.OSFile("stream.arrow", mode="r")
           reader = pyarrow.ipc.open_stream(source)
           return pyarrow.flight.GeneratorStream(reader.schema, 
self._get_batches(reader))
       
       def _get_batches(self, reader: pa.ipc.RecordBatchStreamReader) -> 
Iterator[pa.RecordBatch]:
           for batch in reader:
               yield batch
               sleep(0.5)
       
   if __name__ == "__main__":   
       source = pa.OSFile("stream.arrow", mode="r")
       reader = pyarrow.ipc.open_stream(source)
       table = reader.read_pandas()
   
       logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
       location = "grpc+tls://localhost:8815"
         
       with open("cert1.pem") as cert_file:
           tls_cert_chain = cert_file.read()
   
       with open("cert1.key") as key_file:
           tls_private_key = key_file.read()
   
       tls_certificates = [(tls_cert_chain, tls_private_key)]
       server = FlightServer(location=location, 
tls_certificates=tls_certificates, verify_client=False)
       
       log.info("Starting serve")
       server.serve()
   ```
   
   ### client.py
   ```python
   import faulthandler
   faulthandler.enable()
   
   import pyarrow as pa
   from pyarrow.flight  import FlightClient, Ticket
   import pandas as pd
   
   def repro_do_get() -> pa.RecordBatchStreamReader:
       with open("root-ca.pem", "rb") as cert_file:
           tls_root_certs = cert_file.read()
   
       endpoint_client = FlightClient("grpc+tls://localhost:8815", 
tls_root_certs=tls_root_certs)
       ticket = Ticket(b"")
       reader = endpoint_client.do_get(ticket)
       return reader    
   
   def repro_do_get_df() -> pd.DataFrame:
       reader = repro_do_get()
       return reader.read_pandas()
       
   if __name__ == "__main__":
       df = repro_do_get_df()
       pass
   ```
   
   [stream.zip](https://github.com/apache/arrow/files/13684485/stream.zip)
   
   
   ### Component(s)
   
   FlightRPC, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to