pvardanis commented on issue #34607:
URL: https://github.com/apache/arrow/issues/34607#issuecomment-1978532812

   > If you make 10 concurrent requests to a Python service, the service can 
only handle one at a time due to the GIL. So the concurrency will not get you 
any wall-clock-time speedup. @pvardanis notes that they got the same time 
whether they used a thread pool or ran the requests sequentially. That is the 
question I am answering. They've simply hijacked this thread to ask whether 
asyncio would make a difference: I contend it would not.
   
   Hey @lidavidm, I'm still confused about what happens when I send concurrent 
requests using a `ThreadPoolExecutor` with `max_workers=16` since my server 
seems to handle requests in parallel, in contrast to what you're stating:
   ```
   05-Mar-24 11:58:29 - root - INFO: [📡] Starting server on 
`grpc://0.0.0.0:8080`...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - INFO: Processing 
data...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   05-Mar-24 11:58:43 - mac.service.arrow_flight.server - DEBUG: Starting batch 
processing...
   ```
   The server is hosting a ML model of around 6GB in size and the prediction 
step is CPU-bound. It's still not clear to me what happens with the GIL and 
whether Arrow Flight RPC can process requests in parallel or not. 
   
   What I've also noticed is that the higher the `max_workers` value and the 
number of concurrent requests I'm sending (e.g. 100) the more likely the server 
would hang for a bit during the prediction step, probably because CPU usage is 
at maximum. FWIW the results are not lost and I'm still getting what I'm 
supposed to, it just gets delayed a little bit more which ofc compared to 
handling the requests in parallel is way faster.
   
   Here are some benchmarks I did:
   <img width="302" alt="image" 
src="https://github.com/apache/arrow/assets/37624791/ea15b410-0010-49d7-abe6-0e7a3008616a";>
   `max_workers=1` shows the exact same behavior as sending the requests 
sequentially.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to