Marek Svitok created KAFKA-4176:
-----------------------------------

             Summary: Node stopped receiving heartbeat responses once another 
node started within the same group
                 Key: KAFKA-4176
                 URL: https://issues.apache.org/jira/browse/KAFKA-4176
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 0.10.0.1
         Environment: Centos 7: 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Java: java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

            Reporter: Marek Svitok


I have 3 nodes working in the same group. I started them one after the other. 
As I can see from the log the node once started receives heartbeat responses
for the group it is part of. However once I start another node the former one 
stops receiving these responses and the new one keeps receiving them:

Node0
03:14:36.224 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:39.223 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:39.224 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:39.429 [main-SendThread(mujsignal-03:2182)] DEBUG 
org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
0x256bc1ce8c30170 after 0ms
03:14:39.462 [main-SendThread(mujsignal-03:2182)] DEBUG 
org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
0x256bc1ce8c30171 after 0ms
03:14:42.224 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:42.224 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:45.224 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:45.224 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:14:48.224 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Attempt 
to heart beat failed for group test_streams_id since it is rebalancing.
03:14:48.224 [StreamThread-2] INFO  o.a.k.c.c.i.ConsumerCoordinator - Revoking 
previously assigned partitions [StreamTopic-2] for group test_streams_id
03:14:48.224 [StreamThread-2] INFO  o.a.k.s.p.internals.StreamThread - Removing 
a task 0_2

Node1
03:22:18.710 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:18.716 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:21.709 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:21.716 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:24.710 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:24.717 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:24.872 [main-SendThread(mujsignal-03:2182)] DEBUG 
org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
0x256bc1ce8c30172 after 0ms
03:22:24.992 [main-SendThread(mujsignal-03:2182)] DEBUG 
org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 
0x256bc1ce8c30173 after 0ms
03:22:27.710 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:27.717 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:30.710 [StreamThread-2] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id
03:22:30.716 [StreamThread-1] DEBUG o.a.k.c.c.i.AbstractCoordinator - Received 
successful heartbeat response for group test_streams_id

Configuration used:

03:14:24.520 [main] INFO  o.a.k.c.producer.ProducerConfig - ProducerConfig 
values: 
        metric.reporters = []
        metadata.max.age.ms = 300000
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        bootstrap.servers = [mujsignal-03:9092, mujsignal-09:9093]
        ssl.keystore.type = JKS
        sasl.mechanism = GSSAPI
        max.block.ms = 60000
        interceptor.classes = null
        ssl.truststore.password = null
        client.id = Test-Streams-Processor-StreamThread-2-producer
        ssl.endpoint.identification.algorithm = null
        request.timeout.ms = 30000
        acks = 1
        receive.buffer.bytes = 32768
        ssl.truststore.type = JKS
        retries = 0
        ssl.truststore.location = null
        ssl.keystore.password = null
        send.buffer.bytes = 131072
        compression.type = none
        metadata.fetch.timeout.ms = 60000
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        buffer.memory = 33554432
        timeout.ms = 30000
        key.serializer = class 
org.apache.kafka.common.serialization.ByteArraySerializer
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        block.on.buffer.full = false
        ssl.key.password = null
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        max.in.flight.requests.per.connection = 5
        metrics.num.samples = 2
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        batch.size = 16384
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        max.request.size = 1048576
        value.serializer = class 
org.apache.kafka.common.serialization.ByteArraySerializer
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        partitioner.class = class 
org.apache.kafka.clients.producer.internals.DefaultPartitioner
        linger.ms = 100


03:14:24.547 [main] INFO  o.a.k.c.consumer.ConsumerConfig - ConsumerConfig 
values: 
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy = 
[org.apache.kafka.streams.processor.internals.StreamPartitionAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [mujsignal-03:9092, mujsignal-09:9093]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id = Test-Streams-Processor-StreamThread-2-consumer
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id = test_streams_id
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to