Rob Young created KAFKA-19596: --------------------------------- Summary: Log auto topic creation failures more visibly Key: KAFKA-19596 URL: https://issues.apache.org/jira/browse/KAFKA-19596 Project: Kafka Issue Type: Improvement Components: logging Reporter: Rob Young
Hi, I was playing with Share groups on the 4.1.0-rc2 and found it a little opaque to detect a failure to auto create the `__share_group_state` topic due to the replication factor not being satisfiable. I start up a cluster like this (taken from the dockerhub instructions): {code:java} podman run --rm \ -p 9092:9092 \ -e KAFKA_NODE_ID=1 \ -e KAFKA_PROCESS_ROLES=broker,controller \ -e KAFKA_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093 \ -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \ -e KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER \ -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT \ -e KAFKA_CONTROLLER_QUORUM_VOTERS=1@localhost:9093 \ -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \ -e KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 \ -e KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1 \ -e KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 \ -e KAFKA_NUM_PARTITIONS=3 \ apache/kafka:4.1.0-rc2{code} and then run the commands: {code:java} robeyoun:kafka_2.13-4.1.0$ bin/kafka-features.sh --bootstrap-server localhost:9092 upgrade --feature share.version=1 share.version was upgraded to 1. robeyoun:kafka_2.13-4.1.0$ bin/kafka-topics.sh --create --topic my_topic --bootstrap-server localhost:9092 WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both. Created topic my_topic. robeyoun:kafka_2.13-4.1.0$ bin/kafka-console-share-consumer.sh --bootstrap-server localhost:9092 --topic quickstart-events [2025-08-12 09:50:13,130] WARN Share groups and KafkaShareConsumer are part of a preview feature introduced by KIP-932, and are not recommended for use in production. (org.apache.kafka.clients.consumer.internals.ShareConsumerDelegateCreator) [2025-08-12 09:50:13,357] WARN [ShareConsumer clientId=console-share-consumer, groupId=console-share-consumer] The metadata response from the cluster reported a recoverable issue with correlation id 20 : {quickstart-events=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) {code} the consumer now sits there, appearing to be consuming happily, but the broker is emitting a stream of logs like: {code:java} [2025-08-11 21:51:03,648] INFO Sent auto-creation request for Set(__share_group_state) to the active controller. (kafka.server.DefaultAutoTopicCreationManager) [2025-08-11 21:51:03,648] WARN Received retriable error in find coordinator for InitializeStateHandler using key SharePartitionKey{groupId=console-share-consumer, topicIdPartition=sI9vYBBGSKW_3BfG9ZJWhg:null-1}: The coordinator is not available. (org.apache.kafka.server.share.persister.PersisterStateManager$InitializeStateHandler){code} So there's a clue there that it's repeatedly trying to auto-create the topic. the underlying problem is the default replication factor of 3 for the __share_group_state topic, but I have to crank up the log level to DEBUG (setting -e KAFKA_LOG4J_LOGGERS="kafka=DEBUG") to see the cause: {code:java} [2025-08-11 21:53:58,343] INFO Sent auto-creation request for Set(__share_group_state) to the active controller. (kafka.server.DefaultAutoTopicCreationManager) [2025-08-11 21:53:58,343] WARN Received retriable error in find coordinator for InitializeStateHandler using key SharePartitionKey{groupId=console-share-consumer, topicIdPartition=8kjLkvNzSCy7geIIriJVCQ:null-2}: The coordinator is not available. (org.apache.kafka.server.share.persister.PersisterStateManager$InitializeStateHandler) [2025-08-11 21:53:58,345] DEBUG [broker-1-to-controller-forwarding-channel-manager]: Request CreateTopicsRequestData(topics=[CreatableTopic(name='__share_group_state', numPartitions=50, replicationFactor=3, assignments=[], configs=[CreatableTopicConfig(name='compression.type', value='producer'), CreatableTopicConfig(name='cleanup.policy', value='delete'), CreatableTopicConfig(name='min.insync.replicas', value='2'), CreatableTopicConfig(name='segment.bytes', value='104857600'), CreatableTopicConfig(name='retention.ms', value='-1')])], timeoutMs=30000, validateOnly=false) received ClientResponse(receivedTimeMs=1754949238345, latencyMs=2, disconnected=false, timedOut=false, requestHeader=RequestHeader(apiKey=CREATE_TOPICS, apiVersion=7, clientId=1, correlationId=22, headerVersion=2), responseBody=CreateTopicsResponseData(throttleTimeMs=0, topics=[CreatableTopicResult(name='__share_group_state', topicId=AAAAAAAAAAAAAAAAAAAAAA, errorCode=38, errorMessage='Unable to replicate the partition 3 time(s): The target replication factor of 3 cannot be reached because only 1 broker(s) are registered.', topicConfigErrorCode=0, numPartitions=-1, replicationFactor=-1, configs=[])])) (kafka.server.NodeToControllerRequestThread) [2025-08-11 21:53:58,345] DEBUG Cleared inflight topic creation state for HashMap(__share_group_state -> CreatableTopic(name='__share_group_state', numPartitions=50, replicationFactor=3, assignments=[], configs=[CreatableTopicConfig(name='compression.type', value='producer'), CreatableTopicConfig(name='cleanup.policy', value='delete'), CreatableTopicConfig(name='min.insync.replicas', value='2'), CreatableTopicConfig(name='segment.bytes', value='104857600'), CreatableTopicConfig(name='retention.ms', value='-1')])) (kafka.server.DefaultAutoTopicCreationManager) [2025-08-11 21:53:58,345] DEBUG Auto topic creation completed for Set(__share_group_state) with response CreateTopicsResponseData(throttleTimeMs=0, topics=[CreatableTopicResult(name='__share_group_state', topicId=AAAAAAAAAAAAAAAAAAAAAA, errorCode=38, errorMessage='Unable to replicate the partition 3 time(s): The target replication factor of 3 cannot be reached because only 1 broker(s) are registered.', topicConfigErrorCode=0, numPartitions=-1, replicationFactor=-1, configs=[])]). (kafka.server.DefaultAutoTopicCreationManager) {code} The point where we log the auto creation outcome has two cases where it warn logs, but then this case is being handled by a catch all debug log. [https://github.com/apache/kafka/blob/18045c6ac30921503deffbef1744bb365dc599fb/core/src/main/scala/kafka/server/AutoTopicCreationManager.scala#L141] If there's an error code set, maybe it should log at warn like the other cases? I'm happy to be assigned to this. -- This message was sent by Atlassian Jira (v8.20.10#820010)