[jira] [Commented] (KAFKA-17704) possible race condition in TTL credentials when connectors recycled on single node instance

Dmitri Pavlov (Jira) Tue, 08 Oct 2024 08:56:55 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887643#comment-17887643
 ]


Dmitri Pavlov commented on KAFKA-17704:
---------------------------------------

Regarding this - Can you explain why this is distinct from KAFKA-17627? Thanks 
for asking.

KAFKA-17704 + KAFKA-17627 + KAFKA-9228 all seems about the same topic. Just not 
sure which one to post to. I'm ready to post to the the one that is in a 
working state and has progress.  One more thing from the ticket KAFKA-17627 
description "which may or may not be present on the running machine". Please, 
see below it is all about one connector machine. Not several.

In this case KAFKA-17704 I added a docker lab, where, as per my understanding, 
the issue is reproducible.   If I change the creds (with some bigger than TTL 
intervals), they will be loaded by the connector successfully. And then, post 
settings reload, if I do not see task related messages, then the stale creds 
will be used. Failure. "{+}So if the task is using expires experiences 
authentication problems, that's probably because it hasn't restarted to pick up 
the new secrets.{+}" Totally agree. This looks to be the problem. "The new 
secret printed at 24:38 is triggered by the connector restart, and doesn't 
affect the tasks at all."  In majority of the cases (creds changes) the 
connector updates do trigger tasks restart, like 7 our 10 (very approximate 
number here). And sometimes, very roughly again 3 out of 10, the connector 
updates do +not+ trigger tasks restart. 

> possible race condition in TTL credentials when connectors recycled on single 
> node instance
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17704
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.7.1
>            Reporter: Doug Whitfield
>            Priority: Minor
>         Attachments: For_community.zip.001, For_community.zip.002, 
> For_community.zip.003, For_community.zip.004, For_community.zip.005, 
> For_community.zip.006, For_community.zip.007, 
> image-2024-10-07-11-17-41-951.png, logstoupload.log
>
>
> This is related to https://issues.apache.org/jira/browse/KAFKA-9228 and 
> https://issues.apache.org/jira/browse/KAFKA-17627 but in single node instance 
> and only related to credentials (as far as we know currently), so maybe 
> something else is in play?
> In some cases, when TTL is used with a single node, passwords are not passed 
> properly.
> In the "logstoupload.log" file you can see that at 09:14 the password does 
> not get change, but at 09:24 it does get changed.
> We are able to "reliably" reproduce this in prod-like environment where this 
> log comes from in Kubernetes, but we have only captured this "race condition" 
> in test rarely where we are not using Kubernetes. We have seen it without 
> Kubernetes though.
> We hope to provide something more reproducible next week, but perhaps 
> uploading this "full" log will allow you to guide us so we can make this more 
> reproducible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-17704) possible race condition in TTL credentials when connectors recycled on single node instance

Reply via email to