shunping commented on issue #35867:
URL: https://github.com/apache/beam/issues/35867#issuecomment-3348591262

   Good news! I am able to reproduce the "DEADLINE_EXCEEDED" error when I set 
the number of rows to 100k. 
   
   I did some investigation and found it is due to the timeout of a gPRC 
connection. 
   
   
https://github.com/apache/beam/blob/50e14ace7f6bfb9a28bff59962c2166729adb778/sdks/python/apache_beam/runners/portability/portable_runner.py#L226-L228
   
   By default, we create a gRPC connection with a timeout of 60 seconds:
   
https://github.com/apache/beam/blob/c84f28f84aa4f38cb7209809fd079835c698f0d4/sdks/python/apache_beam/options/pipeline_options.py#L1747-L1757
   
    If it is idle for more than 1 min, it will be cut off and the 
"DEADLINE_EXCEEDED" error will be issued. This is a protection mechanism we 
implemented to avoid a hanging job. 
   
   You can use the pipeline option `job_server_timeout` to override the default 
deadline. I verified that if I set the timeout to be 10 mins, the previously 
failed job (with 100k rows) can run successfully on my end. Could you try that 
and let me know if it works for you too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to