dclim commented on a change in pull request #7428: Add errors and state to stream supervisor status API endpoint URL: https://github.com/apache/incubator-druid/pull/7428#discussion_r281377206
########## File path: docs/content/development/extensions-core/kafka-ingestion.md ########## @@ -214,6 +214,26 @@ offsets as reported by Kafka, the consumer lag per partition, as well as the agg consumer lag per partition may be reported as negative values if the supervisor has not received a recent latest offset response from Kafka. The aggregate lag value will always be >= 0. +The status report also contains the supervisor's state and a list of recently thrown exceptions (whose max size can be Review comment: Well, everything but `UNHEALTHY_TASKS` is mutually exclusive. I think the value is in providing a bit more information to help in debugging, mainly in distinguishing between the 'unable to connect' and 'lost contact' cases. If you didn't distinguish this and just had a list of recent exceptions, how would you be able to tell if this supervisor ever worked and is possibly suffering a 'transient' issue, other than by looking through logs? The 'unhealthy supervisor' case is then a necessary third option to handle exceptions that don't fall into either of the first two categories because they're not stream-related. 'Unhealthy tasks' is more of a nice to have - that way monitoring systems don't have to additionally parse the response of the task API endpoints to figure out that a bunch of tasks are failing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
