pvardanis commented on issue #34607: URL: https://github.com/apache/arrow/issues/34607#issuecomment-1978532812
> If you make 10 concurrent requests to a Python service, the service can only handle one at a time due to the GIL. So the concurrency will not get you any wall-clock-time speedup. @pvardanis notes that they got the same time whether they used a thread pool or ran the requests sequentially. That is the question I am answering. They've simply hijacked this thread to ask whether asyncio would make a difference: I contend it would not. Hey @lidavidm, I'm still confused about what happens when I send concurrent requests using a `ThreadPoolExecutor` with `max_workers=16` since my server seems to handle requests in parallel, in contrast to what you're stating: ``` 05-Mar-24 11:58:29 - root - INFO: [📡] Starting server on `grpc://0.0.0.0:8080`... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing data... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... 05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch processing... ``` The server is hosting a ML model of around 6GB in size and the prediction step is CPU-bound. It's still not clear to me what happens with the GIL and whether Arrow Flight RPC can process requests in parallel or not. What I've also noticed is that the higher the `max_workers` value and the number of concurrent requests I'm sending (e.g. 100) the more likely the server would hang for a bit during the prediction step, probably because CPU usage is at maximum. FWIW the results are not lost and I'm still getting what I'm supposed to, it just gets delayed a little bit more which ofc compared to handling the requests in parallel is way faster. Here are some benchmarks I did: <img width="302" alt="image" src="https://github.com/apache/arrow/assets/37624791/ea15b410-0010-49d7-abe6-0e7a3008616a"> `max_workers=1` shows the exact same behavior as sending the requests sequentially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
