jdanap opened a new issue, #38565:
URL: https://github.com/apache/arrow/issues/38565
### Describe the bug, including details regarding any error messages,
version, and platform.
Hello. I encountered issue with segmentation fault/crashing
`pyarrow.flight.FlightClient` whenever the client is used concurrently.
As we have narrowed down the cause of the failure to the FlightClient's
`authenticate` function, I have simplified the example reproducible usage
scenario to the following:
```python
import concurrent.futures
from time import perf_counter
from pyarrow import flight
from pyarrow.flight import ClientAuthHandler
HOST = 'grpc://obviously.does.not.exist.io:80'
CONCURRENT_TASKS=20
client: flight.FlightClient = flight.FlightClient(HOST)
class NoOpClientAuth(ClientAuthHandler):
def __init__(self):
self.token = None
def authenticate(self, outgoing, incoming):
print("no-op")
def get_token(self):
return self.token
def task1():
return client.authenticate(NoOpClientAuth())
def task2():
return client.authenticate(NoOpClientAuth())
def run_concurrently(tasks_info, max_workers, cancel_future=None):
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as
executor:
futures = []
for task_and_parameters in tasks_info:
task, parameters = task_and_parameters[0],
task_and_parameters[1:]
modified_task = task
futures.append(executor.submit(modified_task, *parameters))
results = []
for future in futures:
try:
results.append(future.result())
except Exception:
if cancel_future:
cancel_future.cancel()
raise
return results
if __name__ == '__main__':
tasks = []
for _ in range(CONCURRENT_TASKS):
tasks.append([task1])
tasks.append([task2])
t1_start = perf_counter()
results = run_concurrently(tasks, 10)
t1_stop = perf_counter()
print("Elapsed time during the whole program in seconds:",
t1_stop-t1_start)
```
With lower level of concurrency, we are less likely to encounter the crash.
Instead, we would get 404 or DNS resolution error as expected (or in real use
case, be able to get through the authentication successfully just fine). But
set it 20 and I have been able to reproduce the crash consistently, albeit
after varying number of `authenticate` calls before the script above exits with
a Segmentation Fault in Unix environment.
Debugging in dev mode, the point of failure was located to be
[here](https://github.com/apache/arrow/blob/5d6192c7db5e13d72d79bc9ca470c544344ec52b/python/pyarrow/src/arrow/python/flight.cc#L68),
where it generates
[std::bad_function_call](https://en.cppreference.com/w/cpp/utility/functional/bad_function_call)
We could work around this by always instantiating a new client instead of
reusing a singleton one. However, to my understanding, [reusing the same client
is
encouraged](https://arrow.apache.org/docs/cpp/flight.html#re-use-clients-whenever-possible).
I am wondering what the original source of failure is, and/or if
re-authenticating over/reusing the same client is not the intended usage in
this case.
### Component(s)
FlightRPC, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]