jdanap opened a new issue, #38565:
URL: https://github.com/apache/arrow/issues/38565

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello. I encountered issue with segmentation fault/crashing 
`pyarrow.flight.FlightClient` whenever the client is used concurrently.
   
   As we have narrowed down the cause of the failure to the FlightClient's 
`authenticate` function, I have simplified the example reproducible usage 
scenario to the following:
   ```python
   import concurrent.futures
   from time import perf_counter
   
   from pyarrow import flight
   from pyarrow.flight import ClientAuthHandler
   
   HOST = 'grpc://obviously.does.not.exist.io:80'
   CONCURRENT_TASKS=20
   client: flight.FlightClient = flight.FlightClient(HOST)
   class NoOpClientAuth(ClientAuthHandler):
       def __init__(self):
           self.token = None
   
       def authenticate(self, outgoing, incoming):
           print("no-op")
   
       def get_token(self):
           return self.token
   
   def task1():
       return client.authenticate(NoOpClientAuth())
   
   def task2():
       return client.authenticate(NoOpClientAuth())
   
   def run_concurrently(tasks_info, max_workers, cancel_future=None):
       with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as 
executor:
           futures = []
           for task_and_parameters in tasks_info:
               task, parameters = task_and_parameters[0], 
task_and_parameters[1:]
               modified_task = task
               futures.append(executor.submit(modified_task, *parameters))
   
           results = []
           for future in futures:
               try:
                   results.append(future.result())
               except Exception:
                   if cancel_future:
                       cancel_future.cancel()
                   raise
           return results
   
   if __name__ == '__main__':
   
       tasks = []
       for _ in range(CONCURRENT_TASKS):
          tasks.append([task1])
          tasks.append([task2])
   
       t1_start = perf_counter()
       results = run_concurrently(tasks, 10)
       t1_stop = perf_counter()
       print("Elapsed time during the whole program in seconds:", 
t1_stop-t1_start)
   ```
   
   With lower level of concurrency, we are less likely to encounter the crash. 
Instead, we would get 404 or DNS resolution error as expected (or in real use 
case, be able to get through the authentication successfully just fine). But 
set it 20 and I have been able to reproduce the crash consistently, albeit 
after varying number of `authenticate` calls before the script above exits with 
a Segmentation Fault in Unix environment.
   
   Debugging in dev mode, the point of failure was located to be 
[here](https://github.com/apache/arrow/blob/5d6192c7db5e13d72d79bc9ca470c544344ec52b/python/pyarrow/src/arrow/python/flight.cc#L68),
 where it generates 
[std::bad_function_call](https://en.cppreference.com/w/cpp/utility/functional/bad_function_call)
   
   We could work around this by always instantiating a new client instead of 
reusing a singleton one. However, to my understanding, [reusing the same client 
is 
encouraged](https://arrow.apache.org/docs/cpp/flight.html#re-use-clients-whenever-possible).
   
   I am wondering what the original source of failure is, and/or if 
re-authenticating over/reusing the same client is not the intended usage in 
this case.
   
   ### Component(s)
   
   FlightRPC, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to