Hi, We are seeing an issue with Flink on our production. The version is 1.7 which we use. We started seeing sudden lag on kafka, and the consumers were no longer working/accepting messages. On trying to enable debug mode, the below errors were seen [image: image.jpeg]
I am not sure why this occurs everyday and when this happens, I can see the remaining workers arent able to handle the load. Unless i restart my jobs, i am unable to start processing again. This way, there is data loss as well. On the below graph, there is a slight dip in consumption before 5:30. That is when this incident happens and correlated with logs. [image: image.jpeg] Any pointers/suggestions would be appreciated. Thanks.