[jira] [Comment Edited] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

Matthias Pohl (Jira) Thu, 01 Feb 2024 23:47:07 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813215#comment-17813215
 ]


Matthias Pohl edited comment on FLINK-34333 at 2/2/24 7:46 AM:
---------------------------------------------------------------

I think it's fine to do the upgrade from v6.6.2 to v6.9.0 even for 1.18:
 * The only change that seems to affect Flink is in the Mock-framework and only 
affects test code
 * Most of the changes are cleanup changes, except for:
 ** Fix [#5262|https://github.com/fabric8io/kubernetes-client/issues/5262]: 
Changed behavior in certain scenarios
 ** Fix [#5125|https://github.com/fabric8io/kubernetes-client/issues/5125]: 
Different default TLS version
 ** Fix [#1335|https://github.com/fabric8io/kubernetes-client/issues/1335]: 
Different fallback used for proxy configuration

Generally, we could argue that downstream project should rely on shaded 
dependencies provided by Flink. And fixing the bug out-weighs the stability 
concerns here. Do you see a problem with this argument [~gyfora], 
[~wangyang0918]?

The alternatives to doing the upgrade are:
 * Reverting the upgrade (i.e. going back from v6.6.2 to v5.12.4). This would 
allow us to get to the stable version that was tested with 1.17-
 * Provide a Flink-customized implementation of the fabric8io {{LeaderElector}} 
class with the cherry-picked changes of 
[8f8c438f|https://github.com/fabric8io/kubernetes-client/commit/8f8c438f] and 
[0f6c6965|https://github.com/fabric8io/kubernetes-client/commit/0f6c6965]. As a 
consequence, we would stick to fabric8io:kubernetes-client v.6.6.2


was (Author: mapohl):
I think it's fine to do the upgrade from v6.6.2 to v6.9.0 even for 1.18:
 * The only change that seems to affect Flink is in the Mock-framework and only 
affects test code
 * Most of the changes are cleanup changes, except for:
 ** Fix [#5262|https://github.com/fabric8io/kubernetes-client/issues/5262]: 
Changed behavior in certain scenarios
 ** Fix [#5125|https://github.com/fabric8io/kubernetes-client/issues/5125]: 
Different default TLS version
 ** Fix [#1335|https://github.com/fabric8io/kubernetes-client/issues/1335]: 
Different fallback used for proxy configuration

Generally, we could argue that downstream project should rely on transitive 
dependencies provided by Flink. And fixing the bug out-weighs the stability 
concerns here. Do you see a problem with this argument [~gyfora], 
[~wangyang0918]?

The alternatives to doing the upgrade are:
 * Reverting the upgrade (i.e. going back from v6.6.2 to v5.12.4). This would 
allow us to get to the stable version that was tested with 1.17-
 * Provide a Flink-customized implementation of the fabric8io {{LeaderElector}} 
class with the cherry-picked changes of 
[8f8c438f|https://github.com/fabric8io/kubernetes-client/commit/8f8c438f] and 
[0f6c6965|https://github.com/fabric8io/kubernetes-client/commit/0f6c6965]. As a 
consequence, we would stick to fabric8io:kubernetes-client v.6.6.2

> Fix FLINK-34007 LeaderElector bug in 1.18
> -----------------------------------------
>
>                 Key: FLINK-34333
>                 URL: https://issues.apache.org/jira/browse/FLINK-34333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.1
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Blocker
>              Labels: pull-request-available
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

Reply via email to