michael-carter-instaclustr commented on pull request #8844: URL: https://github.com/apache/kafka/pull/8844#issuecomment-647356784
Thanks for reviewing this @C0urante. "if the framework successfully instantiates a connector and is able to call start on it, should that alone qualify as a "successful" startup, or does the call to start also have to go off without a hitch?" --- That’s an interesting point and not one that I'd considered. My fundamental assumption in approaching this was that the ‘connector-startup-failure-total’ metric, described in the KIP as ‘The total number of connector starts that failed’, was intended to be a numerical record of failures within the ‘start’ method of the connector (And likewise for the task based metrics). Or in other words, they represented the health of the worker in an integration sense. (e.g. Does the worker have the right connectivity to do its job? Are people submitting valid configurations or are the users of Connect not understanding how to use it?) This to me seems like a useful aggregate metric that relates to the use of the worker as a whole more than a record of any individual connector failure. The way I encountered this was someone on my team was attempting to run a CloudWatch connector, but hadn’t got the configuration quite right, so the ‘start’ method would throw an exception and the connector would enter the failed state every time. However, the worker recorded this as a successful startup (which we all felt to be a bit misleading). With my interpretation of the metric’s intention, this confused the investigation into what the problem was and upon finding the cause. seemed like a bug in the metric (and a big surprise). FWIW, I would say that it still seems like the most natural interpretation of the metric to me, but if this isn’t the intention, then I’d probably suggest renaming the metric to something that de-conflicts its meaning with the connector’s ‘start’ method. E.g connector-start-preparation-failure-total or something like that (And similarly for task start-ups). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org