Ma77Ball opened a new issue, #5718:
URL: https://github.com/apache/texera/issues/5718

   ### What happened?
   Python UDFs that read a dataset file go through 
`DatasetFileDocument.get_presigned_url` and `read_file`, both of which call 
`requests.get(...)` with no `timeout`. Python's `requests` defaults to no 
timeout, so if the file-service or object store accepts the TCP connection but 
then stalls (hung load balancer, half-open socket, network black-hole), the 
call blocks forever and the worker thread hangs with no recovery. There is also 
no retry for transient transport blips.
   
   ### How to reproduce?
   1. Configure a UDF that reads a dataset file via `DatasetFileDocument`.
   2. Point the presign endpoint (or the returned presigned URL host) at an 
endpoint that accepts the connection then never responds (e.g. a 
firewalled/black-holed host).
   3. Run the workflow.
   
   Expected: the read fails after a bounded wait. Actual: `requests.get` blocks 
indefinitely and the worker hangs.
   
   ### Version/Branch
   main (commit 13b584ce0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to