[ https://issues.apache.org/jira/browse/KAFKA-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rosenberg updated KAFKA-2318: ----------------------------------- Description: Using version 0.8.2.1. During a controlled shutdown, it seems like the left-hand is often not talking to the right :) In this case, we see the ReplicaManager remove a fetcher for a partition, truncate it's log, and then apparently try to fetch data from that partition repeatedly, spamming the log with "failed due to Leader not local for partition" warnings. Below is a snippet (in this case it happened for partition '__consumer_offsets,7' and '__consumer_offsets,47'). It went on for quite a bit longer than included here. The current broker is '99' here. {code} 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed fetcher for partitions [__consumer_offsets,7] 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] log.Log - Truncating log __consumer_offsets-7 to offset 0. 2015-07-07 18:54:26,421 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832556 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,429 WARN [kafka-request-handler-4] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345717 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,430 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345718 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,431 WARN [kafka-request-handler-4] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345719 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,432 WARN [kafka-request-handler-5] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345720 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,433 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345721 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,434 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345722 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,436 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345723 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,437 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345724 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,438 WARN [kafka-request-handler-7] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345725 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed fetcher for partitions [__consumer_offsets,47] 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] log.Log - Truncating log __consumer_offsets-47 to offset 0. 2015-07-07 18:54:26,439 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345726 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,443 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345727 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,446 WARN [kafka-request-handler-5] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832559 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] failed due to Leader not local for partition [__consumer_offsets,47] on broker 99 2015-07-07 18:54:26,446 WARN [kafka-request-handler-0] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345728 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,447 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832560 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] failed due to Leader not local for partition [__consumer_offsets,47] on broker 99 2015-07-07 18:54:26,447 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345729 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 {code} was: Using version 0.8.2.1. During a controlled shutdown, it seems like the left-hand is often not talking to the right :) In this case, I we the ReplicaManager remove a fetcher for a partition, truncate it's log, and then apparently try to fetch data from that partition repeatedly, spamming the log with "failed due to Leader not local for partition" warnings. Below is a snippet (in this case it happened for partition '__consumer_offsets,7' and '__consumer_offsets,47'). It went on for quite a bit longer than included here. The current broker is '99' here. {code} 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed fetcher for partitions [__consumer_offsets,7] 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] log.Log - Truncating log __consumer_offsets-7 to offset 0. 2015-07-07 18:54:26,421 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832556 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,429 WARN [kafka-request-handler-4] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345717 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,430 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345718 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,431 WARN [kafka-request-handler-4] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345719 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,432 WARN [kafka-request-handler-5] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345720 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,433 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345721 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,434 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345722 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,436 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345723 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,437 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345724 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,438 WARN [kafka-request-handler-7] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345725 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed fetcher for partitions [__consumer_offsets,47] 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] log.Log - Truncating log __consumer_offsets-47 to offset 0. 2015-07-07 18:54:26,439 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345726 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,443 WARN [kafka-request-handler-3] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345727 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,446 WARN [kafka-request-handler-5] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832559 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] failed due to Leader not local for partition [__consumer_offsets,47] on broker 99 2015-07-07 18:54:26,446 WARN [kafka-request-handler-0] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345728 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 2015-07-07 18:54:26,447 WARN [kafka-request-handler-1] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 6832560 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] failed due to Leader not local for partition [__consumer_offsets,47] on broker 99 2015-07-07 18:54:26,447 WARN [kafka-request-handler-2] server.ReplicaManager - [Replica Manager on Broker 99]: Fetch request with correlation id 4345729 from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] failed due to Leader not local for partition [__consumer_offsets,7] on broker 99 {code} > replica manager repeatedly tries to fetch from partitions already moved > during controlled shutdown > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-2318 > URL: https://issues.apache.org/jira/browse/KAFKA-2318 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > > Using version 0.8.2.1. > During a controlled shutdown, it seems like the left-hand is often not > talking to the right :) > In this case, we see the ReplicaManager remove a fetcher for a partition, > truncate it's log, and then apparently try to fetch data from that partition > repeatedly, spamming the log with "failed due to Leader not local for > partition" warnings. > Below is a snippet (in this case it happened for partition > '__consumer_offsets,7' and '__consumer_offsets,47'). It went on for quite a > bit longer than included here. The current broker is '99' here. > {code} > 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] > server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed > fetcher for partitions [__consumer_offsets,7] > 2015-07-07 18:54:26,415 INFO [kafka-request-handler-0] log.Log - Truncating > log __consumer_offsets-7 to offset 0. > 2015-07-07 18:54:26,421 WARN [kafka-request-handler-3] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 6832556 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,429 WARN [kafka-request-handler-4] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345717 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,430 WARN [kafka-request-handler-2] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345718 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,431 WARN [kafka-request-handler-4] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345719 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,432 WARN [kafka-request-handler-5] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345720 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,433 WARN [kafka-request-handler-2] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345721 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,434 WARN [kafka-request-handler-3] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345722 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,436 WARN [kafka-request-handler-1] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345723 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,437 WARN [kafka-request-handler-2] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345724 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,438 WARN [kafka-request-handler-7] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345725 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] > server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] Removed > fetcher for partitions [__consumer_offsets,47] > 2015-07-07 18:54:26,438 INFO [kafka-request-handler-6] log.Log - Truncating > log __consumer_offsets-47 to offset 0. > 2015-07-07 18:54:26,439 WARN [kafka-request-handler-1] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345726 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,443 WARN [kafka-request-handler-3] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345727 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,446 WARN [kafka-request-handler-5] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 6832559 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] > failed due to Leader not local for partition [__consumer_offsets,47] on > broker 99 > 2015-07-07 18:54:26,446 WARN [kafka-request-handler-0] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345728 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > 2015-07-07 18:54:26,447 WARN [kafka-request-handler-1] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 6832560 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,47] > failed due to Leader not local for partition [__consumer_offsets,47] on > broker 99 > 2015-07-07 18:54:26,447 WARN [kafka-request-handler-2] server.ReplicaManager > - [Replica Manager on Broker 99]: Fetch request with correlation id 4345729 > from client ReplicaFetcherThread-0-99 on partition [__consumer_offsets,7] > failed due to Leader not local for partition [__consumer_offsets,7] on broker > 99 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)