[jira] [Commented] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

Till Rohrmann (Jira) Wed, 25 Nov 2020 00:33:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238600#comment-17238600
 ]


Till Rohrmann commented on FLINK-20249:
---------------------------------------

[~xintongsong] I am not sure whether we should promote this kind of usage 
scenario because 

1) it will lead to very surprising results if users lose their data but assume 
that everything should work if they just relied on K8s to restart the JM 
2) this behaviour should only work for K8s. On Yarn, the old TMs should not be 
able to reconnect to the newly started JM w/o service discovery. Hence, I see 
this as a implementation specific feature of K8s which we should not promote.

I am not saying to remove it just to make it symmetric to the other 
implementations but we should not make this "public" API.

> Rethink the necessity of the k8s internal Service even in non-HA mode
> ---------------------------------------------------------------------
>
>                 Key: FLINK-20249
>                 URL: https://issues.apache.org/jira/browse/FLINK-20249
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.11.0
>            Reporter: Ruguo Yu
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>         Attachments: k8s internal service - in english.pdf, k8s internal 
> service - v2.pdf, k8s internal service.pdf
>
>
> In non-HA mode, k8s will create internal service that directs the 
> communication from TaskManagers Pod to JobManager Pod, and TM Pods could 
> re-register to the new JM Pod once a JM Pod failover occurs.
> However recently I do an experiment and find a problem that k8s will first 
> create new TM pods and then destory old TM pods after a period of time once 
> JM Pod failover (note: new JM podIP has changed), then job will be reschedule 
> by JM on new TM pods, it means new TM has been registered to JM. 
> During this process, internal service is active all the time, but I think it 
> is not necessary that keep this internal service, In other words, wo can weed 
> out internal service and use JM podIP for TM pods communication with JM pod, 
> In this case, it be consistent with HA mode.
> Finally，related experiments is in attached (k8s internal service.pdf).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20249) Rethink the necessity of the k8s internal Service even in non-HA mode

Reply via email to