Pranciskus Gecas created KAFKA-13232:
----------------------------------------
Summary: UnknownTopicOrPartitionException for loading topic during
broker startup.
Key: KAFKA-13232
URL: https://issues.apache.org/jira/browse/KAFKA-13232
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.8.0, 2.7.0
Reporter: Pranciskus Gecas
There is race condition in broker where topic that is loading returns
org.apache.kafka.common.errors.UnknownTopicOrPartitionException.
Issue can be reproduced by producing to topic while broker is restarted. Broker
log clearly shows that topic is being loaded, when it is reported as Unknown.
Issue reproduces only after running test multiple times.
{panel:title=Relevant log}
[2021-08-26 10:21:06,278] DEBUG Initializing task scheduler.
(kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,302] INFO [ThrottledChannelReaper-Fetch]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-08-26 10:21:06,303] INFO [ThrottledChannelReaper-Produce]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-08-26 10:21:06,305] INFO [ThrottledChannelReaper-Request]: Starting
(kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-08-26 10:21:06,306] INFO [ThrottledChannelReaper-ControllerMutation]:
Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2021-08-26 10:21:06,333] INFO Loading logs from log dirs
ArrayBuffer(/bitnami/kafka/data) (kafka.log.LogManager)
[2021-08-26 10:21:06,335] INFO Skipping recovery for all logs in
/bitnami/kafka/data since clean shutdown file was found (kafka.log.LogManager)
[2021-08-26 10:21:06,342] DEBUG Loading log /bitnami/kafka/data/output-0
(kafka.log.LogManager)
[2021-08-26 10:21:06,379] DEBUG Loaded index file
/bitnami/kafka/data/output-0/00000000000000000000.index with maxEntries = 273,
maxIndexSize = 10485760, entries = 273, lastOffset = 17418, file position =
2184 (kafka.log.OffsetIndex)
[2021-08-26 10:21:06,385] DEBUG Resized
/bitnami/kafka/data/output-0/00000000000000000000.index to 10485760, position
is 2184 and limit is 10485760 (kafka.log.AbstractIndex)
[2021-08-26 10:21:06,387] DEBUG Loaded index file
/bitnami/kafka/data/output-0/00000000000000000000.timeindex with maxEntries =
357, maxIndexSize = 10485760, entries = 357, lastOffset =
TimestampOffset(1629973259582,17452), file position = 4284 (kafka.log.TimeIndex)
[2021-08-26 10:21:06,387] DEBUG Resized
/bitnami/kafka/data/output-0/00000000000000000000.timeindex to 10485756,
position is 4284 and limit is 10485756 (kafka.log.AbstractIndex)
[2021-08-26 10:21:06,399] DEBUG [LeaderEpochCache output-0] Cleared entries
ListBuffer(EpochEntry(epoch=0, startOffset=0)) and rewrote first entry
EpochEntry(epoch=0, startOffset=0) after truncating to start offset 0, leaving
1 in the cache. (kafka.server.epoch.LeaderEpochFileCache)
[2021-08-26 10:21:06,403] INFO [Log partition=output-0,
dir=/bitnami/kafka/data] Loading producer state till offset 17453 with message
format version 2 (kafka.log.Log)
[2021-08-26 10:21:06,404] INFO [ProducerStateManager partition=output-0]
Loading producer state from snapshot file
'SnapshotFile(/bitnami/kafka/data/output-0/00000000000000017453.snapshot,17453)'
(kafka.log.ProducerStateManager)
[2021-08-26 10:21:06,413] DEBUG Scheduling task
PeriodicProducerExpirationCheck with initial delay 600000 ms and period 600000
ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,416] INFO Completed load of
Log(dir=/bitnami/kafka/data/output-0, topic=output, partition=0,
highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=17453) with
1 segments in 70ms (1/1 loaded in /bitnami/kafka/data) (kafka.log.LogManager)
[2021-08-26 10:21:06,419] INFO Loaded 1 logs in 86ms. (kafka.log.LogManager)
[2021-08-26 10:21:06,419] INFO Starting log cleanup with a period of 1000 ms.
(kafka.log.LogManager)
[2021-08-26 10:21:06,420] DEBUG Scheduling task kafka-log-retention with
initial delay 30000 ms and period 1000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,420] INFO Starting log flusher with a default period of
9223372036854775807 ms. (kafka.log.LogManager)
[2021-08-26 10:21:06,421] DEBUG Scheduling task kafka-log-flusher with initial
delay 30000 ms and period 9223372036854775807 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,421] DEBUG Scheduling task
kafka-recovery-point-checkpoint with initial delay 30000 ms and period 60000
ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,422] DEBUG Scheduling task
kafka-log-start-offset-checkpoint with initial delay 30000 ms and period 60000
ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,423] DEBUG Scheduling task kafka-delete-logs with initial
delay 30000 ms and period -1 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,739] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2021-08-26 10:21:06,741] INFO Awaiting socket connections on 0.0.0.0:9092.
(kafka.network.Acceptor)
[2021-08-26 10:21:06,768] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Created data-plane acceptor and processors for endpoint :
ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2021-08-26 10:21:06,768] INFO Updated connection-accept-rate max connection
creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2021-08-26 10:21:06,769] INFO Awaiting socket connections on 0.0.0.0:9093.
(kafka.network.Acceptor)
[2021-08-26 10:21:06,775] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Created data-plane acceptor and processors for endpoint :
ListenerName(DOCKERHOST) (kafka.network.SocketServer)
[2021-08-26 10:21:06,795] INFO [broker-1-to-controller-send-thread]: Starting
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:06,795] DEBUG [broker-1-to-controller-send-thread]:
Controller isn't cached, looking for local metadata changes
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:06,796] DEBUG [broker-1-to-controller-send-thread]: No
controller defined in metadata cache, retrying after backoff
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:06,809] INFO [ExpirationReaper-1-Produce]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,810] INFO [ExpirationReaper-1-Fetch]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,811] INFO [ExpirationReaper-1-DeleteRecords]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,812] INFO [ExpirationReaper-1-ElectLeader]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,820] DEBUG Scheduling task isr-expiration with initial
delay 0 ms and period 15000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,820] DEBUG Scheduling task
shutdown-idle-replica-alter-log-dirs-thread with initial delay 0 ms and period
10000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,823] INFO [LogDirFailureHandler]: Starting
(kafka.server.ReplicaManager$LogDirFailureHandler)
[2021-08-26 10:21:06,857] INFO Creating /brokers/ids/1 (is it secure? false)
(kafka.zk.KafkaZkClient)
[2021-08-26 10:21:06,877] INFO Stat of the created znode at /brokers/ids/1 is:
1793,1793,1629973266864,1629973266864,1,0,0,72134560867287128,253,0,1793
(kafka.zk.KafkaZkClient)
[2021-08-26 10:21:06,877] INFO Registered broker 1 at path /brokers/ids/1 with
addresses: PLAINTEXT://kafka-1:9092,DOCKERHOST://127.0.0.1:9093, czxid (broker
epoch): 1793 (kafka.zk.KafkaZkClient)
[2021-08-26 10:21:06,899] DEBUG [broker-1-to-controller-send-thread]:
Controller isn't cached, looking for local metadata changes
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:06,899] DEBUG [broker-1-to-controller-send-thread]: No
controller defined in metadata cache, retrying after backoff
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:06,924] INFO [ExpirationReaper-1-topic]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,927] DEBUG Try to create /controller and increment
controller epoch to 89 with expected controller epoch zkVersion 88
(kafka.zk.KafkaZkClient)
[2021-08-26 10:21:06,928] INFO [ExpirationReaper-1-Heartbeat]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,929] INFO [ExpirationReaper-1-Rebalance]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:06,932] DEBUG [ZooKeeperClient Kafka server] Received event:
WatchedEvent state:SyncConnected type:NodeCreated path:/controller
(kafka.zookeeper.ZooKeeperClient)
[2021-08-26 10:21:06,942] INFO [GroupCoordinator 1]: Starting up.
(kafka.coordinator.group.GroupCoordinator)
[2021-08-26 10:21:06,945] DEBUG Initializing task scheduler.
(kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,945] DEBUG Scheduling task delete-expired-group-metadata
with initial delay 0 ms and period 600000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,946] INFO [GroupCoordinator 1]: Startup complete.
(kafka.coordinator.group.GroupCoordinator)
[2021-08-26 10:21:06,960] DEBUG [ProducerId Manager 1]: Read current
producerId block
(brokerId:1,blockStartProducerId:87000,blockEndProducerId:87999), Zk path
version 88 (kafka.coordinator.transaction.ProducerIdManager)
[2021-08-26 10:21:06,967] DEBUG Conditional update of path
/latest_producer_id_block with value
Unknown macro: \{"version"}
and expected version 88 succeeded, returning the new version: 89
(kafka.zk.KafkaZkClient)
[2021-08-26 10:21:06,968] INFO [ProducerId Manager 1]: Acquired new producerId
block (brokerId:1,blockStartProducerId:88000,blockEndProducerId:88999) by
writing to Zk with path version 89
(kafka.coordinator.transaction.ProducerIdManager)
[2021-08-26 10:21:06,968] INFO [TransactionCoordinator id=1] Starting up.
(kafka.coordinator.transaction.TransactionCoordinator)
[2021-08-26 10:21:06,968] DEBUG Initializing task scheduler.
(kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,969] DEBUG Scheduling task transaction-abort with initial
delay 10000 ms and period 10000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,971] DEBUG Scheduling task transactionalId-expiration
with initial delay 3600000 ms and period 3600000 ms.
(kafka.utils.KafkaScheduler)
[2021-08-26 10:21:06,972] INFO [Transaction Marker Channel Manager 1]:
Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2021-08-26 10:21:06,972] INFO [TransactionCoordinator id=1] Startup complete.
(kafka.coordinator.transaction.TransactionCoordinator)
[2021-08-26 10:21:06,998] DEBUG [Controller id=1, targetBrokerId=1] Initiating
connection to node kafka-1:9092 (id: 1 rack: null) using address
kafka-1/172.18.0.3 (org.apache.kafka.clients.NetworkClient)
[2021-08-26 10:21:07,000] INFO [ExpirationReaper-1-AlterAcls]: Starting
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-08-26 10:21:07,000] DEBUG [broker-1-to-controller-send-thread]:
Controller isn't cached, looking for local metadata changes
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:07,000] DEBUG [broker-1-to-controller-send-thread]: No
controller defined in metadata cache, retrying after backoff
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:07,003] DEBUG [Controller id=1, targetBrokerId=1] Created
socket with SO_RCVBUF = 65536, SO_SNDBUF = 1313280, SO_TIMEOUT = 0 to node 1
(org.apache.kafka.common.network.Selector)
[2021-08-26 10:21:07,003] DEBUG [Controller id=1, targetBrokerId=1] Completed
connection to node 1. Ready. (org.apache.kafka.clients.NetworkClient)
[2021-08-26 10:21:07,025] DEBUG [Controller id=1, targetBrokerId=1] Sending
UPDATE_METADATA request with header RequestHeader(apiKey=UPDATE_METADATA,
apiVersion=7, clientId=1, correlationId=0) and timeout 30000 to node 1:
UpdateMetadataRequestData(controllerId=1, controllerEpoch=89, brokerEpoch=1793,
ungroupedPartitionStates=[], topicStates=[],
liveBrokers=[UpdateMetadataBroker(id=1, v0Host='', v0Port=0,
endpoints=[UpdateMetadataEndpoint(port=9092, host='kafka-1',
listener='PLAINTEXT', securityProtocol=0), UpdateMetadataEndpoint(port=9093,
host='127.0.0.1', listener='DOCKERHOST', securityProtocol=0)], rack=null)])
(org.apache.kafka.clients.NetworkClient)
[2021-08-26 10:21:07,026] INFO [/config/changes-event-process-thread]:
Starting
(kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2021-08-26 10:21:07,033] DEBUG Removing leader.replication.throttled.replicas
from broker 1 for topic output (kafka.server.TopicConfigHandler)
[2021-08-26 10:21:07,034] DEBUG Removing
follower.replication.throttled.replicas from broker 1 for topic output
(kafka.server.TopicConfigHandler)
[2021-08-26 10:21:07,038] DEBUG Initializing task scheduler.
(kafka.utils.KafkaScheduler)
[2021-08-26 10:21:07,038] DEBUG Scheduling task auto-leader-rebalance-task
with initial delay 5000 ms and period -1000 ms. (kafka.utils.KafkaScheduler)
[2021-08-26 10:21:07,041] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Starting socket server acceptors and processors (kafka.network.SocketServer)
[2021-08-26 10:21:07,042] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Wait for authorizer to complete start up on listener
ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2021-08-26 10:21:07,043] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Start processors on listener ListenerName(PLAINTEXT)
(kafka.network.SocketServer)
[2021-08-26 10:21:07,045] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Start acceptor thread on listener ListenerName(PLAINTEXT)
(kafka.network.SocketServer)
[2021-08-26 10:21:07,046] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Started data-plane acceptor and processor(s) for endpoint :
ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2021-08-26 10:21:07,046] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Wait for authorizer to complete start up on listener
ListenerName(DOCKERHOST) (kafka.network.SocketServer)
[2021-08-26 10:21:07,046] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Start processors on listener ListenerName(DOCKERHOST)
(kafka.network.SocketServer)
[2021-08-26 10:21:07,047] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Start acceptor thread on listener ListenerName(DOCKERHOST)
(kafka.network.SocketServer)
[2021-08-26 10:21:07,048] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Started data-plane acceptor and processor(s) for endpoint :
ListenerName(DOCKERHOST) (kafka.network.SocketServer)
[2021-08-26 10:21:07,049] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1]
Started socket server acceptors and processors (kafka.network.SocketServer)
[2021-08-26 10:21:07,052] DEBUG Accepted connection from /172.18.0.1:46152 on
/172.18.0.3:9093 and assigned it to processor 3, sendBufferSize
[actual|requested]: [102400|102400] recvBufferSize [actual|requested]:
[102400|102400] (kafka.network.Acceptor)
[2021-08-26 10:21:07,052] DEBUG Accepted connection from /172.18.0.3:59924 on
/172.18.0.3:9092 and assigned it to processor 0, sendBufferSize
[actual|requested]: [102400|102400] recvBufferSize [actual|requested]:
[102400|102400] (kafka.network.Acceptor)
[2021-08-26 10:21:07,055] DEBUG Processor 0 listening to new connection from
/172.18.0.3:59924 (kafka.network.Processor)
[2021-08-26 10:21:07,055] DEBUG Processor 3 listening to new connection from
/172.18.0.1:46152 (kafka.network.Processor)
[2021-08-26 10:21:07,055] INFO Kafka version: 2.8.0
(org.apache.kafka.common.utils.AppInfoParser)
[2021-08-26 10:21:07,056] INFO Kafka commitId: ebb1d6e21cc92130
(org.apache.kafka.common.utils.AppInfoParser)
[2021-08-26 10:21:07,056] INFO Kafka startTimeMs: 1629973267049
(org.apache.kafka.common.utils.AppInfoParser)
[2021-08-26 10:21:07,058] INFO [KafkaServer id=1] started
(kafka.server.KafkaServer)
[2021-08-26 10:21:07,078] DEBUG [Controller id=1, targetBrokerId=1] Received
UPDATE_METADATA response from node 1 for request with header
RequestHeader(apiKey=UPDATE_METADATA, apiVersion=7, clientId=1,
correlationId=0): UpdateMetadataResponseData(errorCode=0)
(org.apache.kafka.clients.NetworkClient)
[2021-08-26 10:21:07,082] DEBUG [Controller id=1, targetBrokerId=1] Sending
LEADER_AND_ISR request with header RequestHeader(apiKey=LEADER_AND_ISR,
apiVersion=5, clientId=1, correlationId=1) and timeout 30000 to node 1:
LeaderAndIsrRequestData(controllerId=1, controllerEpoch=89, brokerEpoch=1793,
type=0, ungroupedPartitionStates=[],
topicStates=[LeaderAndIsrTopicState(topicName='output',
topicId=te1YAUx9SiikZgwJoypbWg,
partitionStates=[LeaderAndIsrPartitionState(topicName='output',
partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1],
zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[],
isNew=false)])], liveLeaders=[LeaderAndIsrLiveLeader(brokerId=1,
hostName='kafka-1', port=9092)]) (org.apache.kafka.clients.NetworkClient)
[2021-08-26 10:21:07,100] DEBUG [broker-1-to-controller-send-thread]:
Controller isn't cached, looking for local metadata changes
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:07,101] INFO [broker-1-to-controller-send-thread]: Recorded
new controller, from now on will use broker kafka-1:9092 (id: 1 rack: null)
(kafka.server.BrokerToControllerRequestThread)
[2021-08-26 10:21:07,102] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions Set(output-0) (kafka.server.ReplicaFetcherManager)
[2021-08-26 10:21:07,104] DEBUG [KafkaApi-1] Produce request with correlation
id 123 from client rdkafka on partition output-0 failed due to
org.apache.kafka.common.errors.UnknownTopicOrPartitionException
(kafka.server.KafkaApis)
[2021-08-26 10:21:07,108] DEBUG [SocketServer listenerType=ZK_BROKER,
nodeId=1] Connection with /172.18.0.1 disconnected
(org.apache.kafka.common.network.Selector)
java.io.EOFException
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481)
at kafka.network.Processor.poll(SocketServer.scala:989)
at kafka.network.Processor.run(SocketServer.scala:892)
at java.base/java.lang.Thread.run(Thread.java:829)
[2021-08-26 10:21:07,111] INFO [Partition output-0 broker=1] Log loaded for
partition output-0 with initial high watermark 17453 (kafka.cluster.Partition)
[2021-08-26 10:21:07,116] DEBUG Scheduling task highwatermark-checkpoint with
initial delay 0 ms and period 5000 ms. (kafka.utils.KafkaScheduler)
{panel}
Configuration:
- KAFKA_BROKER_ID=1
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_ZOOKEEPER_CONNECT=${KAFKA_ZOOKEEPER_CONNECT:-zookeeper:2181}
- KAFKA_CFG_LISTENERS=PLAINTEXT://0.0.0.0:9092,DOCKERHOST://0.0.0.0:9093
-
KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,DOCKERHOST:PLAINTEXT
-
KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka-1:9092,DOCKERHOST://127.0.0.1:9093
- KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=false
- KAFKA_CFG_TOPIC_METADATA_REFRESH_INTERVAL_MS=10
- KAFKA_CFG_METADATA_REFRESH_INTERVAL_MS=10
- KAFKA_CFG_LOG_RETENTION_CHECK_INTERVAL_MS=1000
- KAFKA_CFG_LOG_CLEANER_BACKOFF_MS=1000
--
This message was sent by Atlassian Jira
(v8.3.4#803005)