BalaMahesh commented on issue #4230:
URL: https://github.com/apache/hudi/issues/4230#issuecomment-1094848826

   from the logs, it seems javalin server is not able to send response to the 
client within the timeout period. Would this happen when there are many 
requests to the server and it is not able to handle ? If thats the case, how 
can we actually identify and debug it. ? But this is happening with low 
throughput topics too after running good for certain period of time. The bad 
part is when we are submitting hudi job using spark-operator on k8's , all the 
executor pods will die and driver pod just stuck with this error and 
spark-operator doesn't mark the driver as failed and just leaves it in that 
state. Once the lag is identified from other source, application has to be 
deleted manually and submit again to start it. It has become hard to manage 
hudi in this way. It would be really helpful if fixes can be provided to this 
behaviour. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to