C0urante commented on PR #16496: URL: https://github.com/apache/kafka/pull/16496#issuecomment-2225896922
Thanks for the ping @jolshan 🙂 I could have sworn producers already logged `UNKNOWN_TOPIC_OR_PARTITION` errors during this scenario, so I ran the [ConnectWorkerIntegrationTest::testSourceTaskNotBlockedOnShutdownWithNonExistenTopic](https://github.com/apache/kafka/blob/0ada8fac6869cad8ac33a79032cf5d57bfa2a3ea/connect/runtime/src/test/java/org/apache/kafka/connect/integration/ConnectWorkerIntegrationTest.java#L349) test case locally on the latest trunk to check. I see these `WARN`-level messages being logged: ``` WARN [simple-connector|task-2] [Producer clientId=connector-producer-simple-connector-2] The metadata response from the cluster reported a recoverable issue with correlation id 4 : {nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1218) WARN [simple-connector|task-0] [Producer clientId=connector-producer-simple-connector-0] The metadata response from the cluster reported a recoverable issue with correlation id 4 : {nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1218) WARN [simple-connector|task-0] [Producer clientId=connector-producer-simple-connector-0] The metadata response from the cluster reported a recoverable issue with correlation id 5 : {nonexistenttopic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1218) ``` I do think we could add a new message with clearer wording to let users know that this may indicate that the topic doesn't exist and the producer will block in `send` until the timeout expires or the topic is created. But IMO this would best be accomplished with a few tweaks: - Only log this once per invocation of `send` - Log at `WARN` level - State that this message can be ignored if the topic has been recently created And as far as a long-term functional fix goes, yes, I think there's been some talk of a small KIP to limit the retry duration specifically for `UNKNOWN_TOPIC_OR_PARTITION` errors, both before `send` returns (which will happen if metadata for the topic partition hasn't been cached yet and cannot be found before the timeout expires) and after a record has been added to a batch (which may happen if cached metadata for the topic partition is used, but the topic is deleted between the last successful metadata fetch and when the record is sent to the broker). Obviously that's out of scope for this PR, so I don't think that those plans should should cause us to abandon this logging improvement in the meantime. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org