[ https://issues.apache.org/jira/browse/KAFKA-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610937#comment-15610937 ]
Mazhar Shaikh commented on KAFKA-4128: -------------------------------------- [2016-10-26 18:36:55,992] INFO KafkaConfig values: request.timeout.ms = 30000 log.roll.hours = 168 inter.broker.protocol.version = 0.9.0.X log.preallocate = false security.inter.broker.protocol = PLAINTEXT controller.socket.timeout.ms = 30000 broker.id.generation.enable = true ssl.keymanager.algorithm = SunX509 ssl.key.password = null log.cleaner.enable = false ssl.provider = null num.recovery.threads.per.data.dir = 2 background.threads = 10 unclean.leader.election.enable = true sasl.kerberos.kinit.cmd = /usr/bin/kinit replica.lag.time.max.ms = 16000 ssl.endpoint.identification.algorithm = null auto.create.topics.enable = false zookeeper.sync.time.ms = 2000 ssl.client.auth = none ssl.keystore.password = null log.cleaner.io.buffer.load.factor = 0.9 offsets.topic.compression.codec = 0 log.retention.hours = 168 log.dirs = /data/kafka/broker-b1 ssl.protocol = TLS log.index.size.max.bytes = 10485760 sasl.kerberos.min.time.before.relogin = 60000 log.retention.minutes = null connections.max.idle.ms = 600000 ssl.trustmanager.algorithm = PKIX offsets.retention.minutes = 1440 max.connections.per.ip = 2147483647 replica.fetch.wait.max.ms = 500 metrics.num.samples = 2 port = 9092 offsets.retention.check.interval.ms = 600000 log.cleaner.dedupe.buffer.size = 134217728 log.segment.bytes = 1073741824 group.min.session.timeout.ms = 6000 producer.purgatory.purge.interval.requests = 1000 min.insync.replicas = 1 ssl.truststore.password = null log.flush.scheduler.interval.ms = 2000 socket.receive.buffer.bytes = 16777216 leader.imbalance.per.broker.percentage = 10 num.io.threads = 32 zookeeper.connect = b1.broker.com:2181,b2.broker.com:2181,zoo3.broker.com:2182 queued.max.requests = 500 offsets.topic.replication.factor = 3 replica.socket.timeout.ms = 30000 offsets.topic.segment.bytes = 104857600 replica.high.watermark.checkpoint.interval.ms = 5000 broker.id = 0 ssl.keystore.location = null listeners = PLAINTEXT://0.0.0.0:9092 log.flush.interval.messages = 20000 principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder log.retention.ms = null offsets.commit.required.acks = -1 sasl.kerberos.principal.to.local.rules = [DEFAULT] group.max.session.timeout.ms = 30000 num.replica.fetchers = 16 advertised.listeners = null replica.socket.receive.buffer.bytes = 16777216 delete.topic.enable = true log.index.interval.bytes = 4096 metric.reporters = [] compression.type = producer log.cleanup.policy = delete controlled.shutdown.max.retries = 1 log.cleaner.threads = 1 quota.window.size.seconds = 1 zookeeper.connection.timeout.ms = 6000 offsets.load.buffer.size = 5242880 zookeeper.session.timeout.ms = 30000 ssl.cipher.suites = null authorizer.class.name = sasl.kerberos.ticket.renew.jitter = 0.05 sasl.kerberos.service.name = null controlled.shutdown.enable = true offsets.topic.num.partitions = 50 quota.window.num = 11 message.max.bytes = 1000012 log.cleaner.backoff.ms = 15000 log.roll.jitter.hours = 0 log.retention.check.interval.ms = 30000 replica.fetch.max.bytes = 1048576 log.cleaner.delete.retention.ms = 86400000 fetch.purgatory.purge.interval.requests = 1000 log.cleaner.min.cleanable.ratio = 0.5 offsets.commit.timeout.ms = 5000 zookeeper.set.acl = false log.retention.bytes = 4294967296 offset.metadata.max.bytes = 4096 leader.imbalance.check.interval.seconds = 300 quota.consumer.default = 9223372036854775807 log.roll.jitter.ms = null reserved.broker.max.id = 1000 replica.fetch.backoff.ms = 1000 advertised.host.name = b1.broker.com quota.producer.default = 9223372036854775807 log.cleaner.io.buffer.size = 524288 controlled.shutdown.retry.backoff.ms = 2000 log.dir = /tmp/kafka-logs log.flush.offset.checkpoint.interval.ms = 60000 log.segment.delete.delay.ms = 60000 num.partitions = 96 num.network.threads = 16 socket.request.max.bytes = 104857600 sasl.kerberos.ticket.renew.window.factor = 0.8 log.roll.ms = null ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] socket.send.buffer.bytes = 16777216 log.flush.interval.ms = 10000 ssl.truststore.location = null log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308 default.replication.factor = 2 metrics.sample.window.ms = 30000 auto.leader.rebalance.enable = true host.name = ssl.truststore.type = JKS advertised.port = null max.connections.per.ip.overrides = replica.fetch.min.bytes = 1 ssl.keystore.type = JKS (kafka.server.KafkaConfig) > Kafka broker losses messages when zookeeper session times out > ------------------------------------------------------------- > > Key: KAFKA-4128 > URL: https://issues.apache.org/jira/browse/KAFKA-4128 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.1, 0.9.0.1 > Reporter: Mazhar Shaikh > Priority: Critical > Attachments: log.tar.gz > > > Pumping 30k msgs/second after some 6-8 hrs of run below logs are printed and > the messages are lost. > [More than 5k messages are lost on every partitions] > Below are few logs: > [2016-09-06 05:00:42,595] INFO Client session timed out, have not heard from > server in 20903ms for sessionid 0x256fabec47c0003, closing socket connection > and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:42,696] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > [2016-09-06 05:00:42,753] INFO Partition [topic,62] on broker 4: Shrinking > ISR for partition [topic,62] from 4,2 to 4 (kafka.cluster.Partition) > [2016-09-06 05:00:43,585] INFO Opening socket connection to server > b0/169.254.2.1:2182. Will not attempt to authenticate using SASL (unknown > error) (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:43,586] INFO Socket connection established to > b0/169.254.2.1:2182, initiating session (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:43,587] INFO Unable to read additional data from server > sessionid 0x256fabec47c0003, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,644] INFO Opening socket connection to server > b1/169.254.2.116:2181. Will not attempt to authenticate using SASL (unknown > error) (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,651] INFO Socket connection established to > b1/169.254.2.116:2181, initiating session (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,658] INFO zookeeper state changed (Expired) > (org.I0Itec.zkclient.ZkClient) > [2016-09-06 05:00:44,659] INFO Initiating client connection, > connectString=b2.broker.com:2181,b1.broker.com:2181,zoo3.broker.com:2182 > sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@37b8e86a > (org.apache.zookeeper.ZooKeeper) > [2016-09-06 05:00:44,659] INFO Unable to reconnect to ZooKeeper service, > session 0x256fabec47c0003 has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,661] INFO EventThread shut down > (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,662] INFO Opening socket connection to server > b2/169.254.2.216:2181. Will not attempt to authenticate using SASL (unknown > error) (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,662] INFO Socket connection established to > b2/169.254.2.216:2181, initiating session (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,665] ERROR Error handling event ZkEvent[New session > event sent to > kafka.controller.KafkaController$SessionExpirationListener@33b7dedc] > (org.I0Itec.zkclient.ZkEventThread) > java.lang.IllegalStateException: Kafka scheduler has not been started > at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114) > at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86) > at > kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350) > at > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1108) > at > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1107) > at > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1107) > at kafka.utils.Utils$.inLock(Utils.scala:535) > at > kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107) > at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472) > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > [2016-09-06 05:00:44,666] INFO re-registering broker info in ZK for broker 4 > (kafka.server.KafkaHealthcheck) > [2016-09-06 05:00:44,801] INFO Session establishment complete on server > b2/169.254.2.216:2181, sessionid = 0x256fabec47c0005, negotiated timeout = > 15000 (org.apache.zookeeper.ClientCnxn) > [2016-09-06 05:00:44,802] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > [2016-09-06 05:00:44,812] INFO Registered broker 4 at path /brokers/ids/4 > with address b5.broker.com:9092. (kafka.utils.ZkUtils$) > [2016-09-06 05:00:44,813] INFO done re-registering broker > (kafka.server.KafkaHealthcheck) > [2016-09-06 05:00:44,814] INFO Subscribing to /brokers/topics path to watch > for new topics (kafka.server.KafkaHealthcheck) > [2016-09-06 05:00:44,831] INFO Partition [topic,62] on broker 4: Expanding > ISR for partition [topic,62] from 4 to 4,2 (kafka.cluster.Partition) > [2016-09-06 05:00:44,865] INFO New leader is 1 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) > [2016-09-06 05:00:45,762] INFO [ReplicaFetcherManager on broker 4] Removed > fetcher for partitions > [topic,63],[topic,62],[topic,58],[topic,52],[topic,56],[topic,49],[topic,60],[topic,54],[topic,48],[topic,61],[topic,50],[topic,53],[topic,59],[topic,57],[topic,51],[topic,55] > (kafka.server.ReplicaFetcherManager) > [2016-09-06 05:00:45,764] INFO Truncating log topic-63 to offset 9971093. > (kafka.log.Log) > [2016-09-06 05:00:45,771] INFO Truncating log topic-62 to offset 9971439. > (kafka.log.Log) > [2016-09-06 05:00:45,773] INFO Truncating log topic-58 to offset 9971298. > (kafka.log.Log) > [2016-09-06 05:00:45,774] INFO Truncating log topic-52 to offset 9740617. > (kafka.log.Log) > [2016-09-06 05:00:45,775] INFO Truncating log topic-56 to offset 9971829. > (kafka.log.Log) > [2016-09-06 05:00:45,777] INFO Truncating log topic-49 to offset 9740178. > (kafka.log.Log) > [2016-09-06 05:00:45,778] INFO Truncating log topic-60 to offset 9971167. > (kafka.log.Log) > [2016-09-06 05:00:45,780] INFO Truncating log topic-54 to offset 9738892. > (kafka.log.Log) > [2016-09-06 05:00:45,781] INFO Truncating log topic-48 to offset 9739499. > (kafka.log.Log) > [2016-09-06 05:00:45,787] INFO Truncating log topic-61 to offset 9972030. > (kafka.log.Log) > [2016-09-06 05:00:45,788] INFO Truncating log topic-50 to offset 9740175. > (kafka.log.Log) > [2016-09-06 05:00:45,793] INFO Truncating log topic-53 to offset 9740074. > (kafka.log.Log) > [2016-09-06 05:00:45,794] INFO Truncating log topic-59 to offset 9971052. > (kafka.log.Log) > [2016-09-06 05:00:45,795] INFO Truncating log topic-57 to offset 9970962. > (kafka.log.Log) -- This message was sent by Atlassian JIRA (v6.3.4#6332)