[jira] [Commented] (KAFKA-16986) After upgrading to Kafka 3.4.1, the producer constantly produces logs related to topicId changes

Vinicius Vieira dos Santos (Jira) Mon, 01 Jul 2024 10:14:18 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861213#comment-17861213
 ]


Vinicius Vieira dos Santos commented on KAFKA-16986:
----------------------------------------------------

[~jolshan] We managed to make this change, but in our development broker we 
have around 2500 partitions and countless topics and consumers, so the log was 
a mess and I was only able to leave it active for a few hours to avoid crashing 
our log collection tools during this period. Unfortunately I couldn't find any 
application with the behavior I reported, I'll have to start a locally isolated 
Kafka to evaluate this behavior and this could end up taking a while :/

 

I was able to verify that the logs are present in our approval and production 
environment as well, that is, Kafka installations completely isolated from each 
other that present the same log that I reported and in the same scenario where 
multiple times for the same topic and partition there is the message " 
associated topicId changed from null to xxxx"

> After upgrading to Kafka 3.4.1, the producer constantly produces logs related 
> to topicId changes
> ------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-16986
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16986
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>    Affects Versions: 3.0.1, 3.6.1
>            Reporter: Vinicius Vieira dos Santos
>            Priority: Minor
>         Attachments: image-2024-07-01-09-05-17-147.png, image.png
>
>
> When updating the Kafka broker from version 2.7.0 to 3.4.1, we noticed that 
> the applications began to log the message "{*}Resetting the last seen epoch 
> of partition PAYMENTS-0 to 0 since the associated topicId changed from null 
> to szRLmiAiTs8Y0nI8b3Wz1Q{*}" in a very constant, from what I understand this 
> behavior is not expected because the topic was not deleted and recreated so 
> it should simply use the cached data and not go through this client log line.
> We have some applications with around 15 topics and 40 partitions which means 
> around 600 log lines when metadata updates occur
> The main thing for me is to know if this could indicate a problem or if I can 
> simply change the log level of the org.apache.kafka.clients.Metadata class to 
> warn without worries
>  
> There are other reports of the same behavior like this:  
> [https://stackoverflow.com/questions/74652231/apache-kafka-resetting-the-last-seen-epoch-of-partition-why]
>  
> *Some log occurrences over an interval of about 7 hours, each block refers to 
> an instance of the application in kubernetes*
>  
> !image.png!
> *My scenario:*
> *Application:*
>  - Java: 21
>  - Client: 3.6.1, also tested on 3.0.1 and has the same behavior
> *Broker:*
>  - Cluster running on Kubernetes with the bitnami/kafka:3.4.1-debian-11-r52 
> image
>  
> *Producer Config*
>  
>     acks = -1
>     auto.include.jmx.reporter = true
>     batch.size = 16384
>     bootstrap.servers = [server:9092]
>     buffer.memory = 33554432
>     client.dns.lookup = use_all_dns_ips
>     client.id = producer-1
>     compression.type = gzip
>     connections.max.idle.ms = 540000
>     delivery.timeout.ms = 30000
>     enable.idempotence = true
>     interceptor.classes = []
>     key.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer
>     linger.ms = 0
>     max.block.ms = 60000
>     max.in.flight.requests.per.connection = 1
>     max.request.size = 1048576
>     metadata.max.age.ms = 300000
>     metadata.max.idle.ms = 300000
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 30000
>     partitioner.adaptive.partitioning.enable = true
>     partitioner.availability.timeout.ms = 0
>     partitioner.class = null
>     partitioner.ignore.keys = false
>     receive.buffer.bytes = 32768
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 30000
>     retries = 3
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = null
>     sasl.jaas.config = [hidden]
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 60000
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.connect.timeout.ms = null
>     sasl.login.read.timeout.ms = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.login.retry.backoff.max.ms = 10000
>     sasl.login.retry.backoff.ms = 100
>     sasl.mechanism = PLAIN
>     sasl.oauthbearer.clock.skew.seconds = 30
>     sasl.oauthbearer.expected.audience = null
>     sasl.oauthbearer.expected.issuer = null
>     sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
>     sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
>     sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
>     sasl.oauthbearer.jwks.endpoint.url = null
>     sasl.oauthbearer.scope.claim.name = scope
>     sasl.oauthbearer.sub.claim.name = sub
>     sasl.oauthbearer.token.endpoint.url = null
>     security.protocol = SASL_PLAINTEXT
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 30000
>     socket.connection.setup.timeout.ms = 10000
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>     transaction.timeout.ms = 60000
>     transactional.id = null
>     value.serializer = class 
> org.apache.kafka.common.serialization.ByteArraySerializer
>  
> If you need any more details, please let me know.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-16986) After upgrading to Kafka 3.4.1, the producer constantly produces logs related to topicId changes

Reply via email to