erankor commented on issue #5979: Kafka Indexing Service lagging every hour URL: https://github.com/apache/incubator-druid/issues/5979#issuecomment-403503558 The issue got much worse today :( reached 30-40 min lag during handoff. I tried to stop realtime ingestion completely but it didn't help - 1. shutdown KIS tasks (had to kill the tasks manually to have it complete) 2. stopped the overlord, coordinator, middle managers 3. restarted all the services 4. re-enabled KIS tasks Adding some logs that may hopefully help troubleshoot this [overlord-exception2.txt](https://github.com/apache/incubator-druid/files/2176642/overlord-exception2.txt) [overlord-exception3.txt](https://github.com/apache/incubator-druid/files/2176643/overlord-exception3.txt) [stalled-task.txt](https://github.com/apache/incubator-druid/files/2176644/stalled-task.txt) [overlord-exception1.txt](https://github.com/apache/incubator-druid/files/2176645/overlord-exception1.txt) 1. stalled-task.txt - if I'm reading this correctly, the worker was waiting for ~15 min on some request that was issued to the overlord 2. overlord-exception1-3.txt - some exceptions that I saw on the overlord logs - a. "io.druid.java.util.common.RetryUtils - Failed on try x" + MySQLIntegrityConstraintViolationException b. "The RuntimeException could not be mapped to a response, re-throwing to the HTTP container" + MySQLIntegrityConstraintViolationException c. "The RuntimeException could not be mapped to a response, re-throwing to the HTTP container" + "Unable to grant lock to inactive Task" These exceptions seem to be happening quite a lot, here the numbers for some random 2 hours - $ journalctl -u druid_over* -S -7200 | grep WARN | grep -c 'io.druid.java.util.common.RetryUtils' 92 $ journalctl -u druid_over* -S -7200 | grep -c 'Unable to grant lock to inactive Task' 135 $ journalctl -u druid_over* -S -7200 | grep -c 'UnableToExecuteStatementException' 179 Any direction you can give me here would be appreciated Thank you Eran
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org