BalaMahesh commented on issue #4230: URL: https://github.com/apache/hudi/issues/4230#issuecomment-1094848826
from the logs, it seems javalin server is not able to send response to the client within the timeout period. Would this happen when there are many requests to the server and it is not able to handle ? If thats the case, how can we actually identify and debug it. ? But this is happening with low throughput topics too after running good for certain period of time. The bad part is when we are submitting hudi job using spark-operator on k8's , all the executor pods will die and driver pod just stuck with this error and spark-operator doesn't mark the driver as failed and just leaves it in that state. Once the lag is identified from other source, application has to be deleted manually and submit again to start it. It has become hard to manage hudi in this way. It would be really helpful if fixes can be provided to this behaviour. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
