[ 
https://issues.apache.org/jira/browse/FLINK-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984840#comment-15984840
 ] 

ASF GitHub Bot commented on FLINK-6078:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/3781

    [FLINK-6078] Remove CuratorFramework#close calls from ZooKeeper based HA 
services

    This PR is based on #3622.
    
    The main goal of this PR is to prevent the ZooKeeper based leader election 
and retrieval services from closing the underlying `CuratorFramework` instance 
when a election/retrieval service is closed. This will allow to share a single 
`CuratorFramework` instance among multiple election/retrieval services. This is 
a strict requirement for the Flip-6 work where all election/retrieval services 
are created by a `HighAvailabilityServices` implementation which shares the 
`CuratorFramework` among the created services. The respective changes can be 
found in the `ZooKeeperLeader[Election, Retrieval]Service` classes.
    
    In the existing code we now use as well an instance of 
`HighAvailabilityServices` in order to create the election/retrieval services 
and to manage the `CuratorFramework` instances. The respective changes are 
contained in `JobManager.scala:2036`, `TaskManager.scala:1643`, 
`MesosApplicationMasterRunner.java:299` and 
`YarnApplicationMasterRunner.java:343`. 
    
    In order to create `Leader[Retrieval, Election]Services` for the 
`JobManager`, we need to provide a `JobID` to the `HighAvailabilityServices`. 
Since there is no such `JobID` defined a priori for a `JobManager`, we have 
introduced the `HighAvailabilityServices.DEFAULT_JOB_ID` which is to be used 
with the old distributed components.
    
    We also changed the `FlinkMiniCluster` to use the `EmbeddedHaServices` or 
the `ZooKeeperHaServices` in case of HA. The former service has HA like 
capabilities which allow to dynamically elect new leaders and notify retrievers 
about these changes. This allows to write better integration tests. The 
downside is that we can no longer connect via a `RemoteExecutionEnvironment` to 
a `FlinkMiniCluster`, because there is no way to obtain the current leader 
session id remotely. In order to execute Flink jobs on the `FlinkMiniCluster`, 
we have extended the `TestEnvironment` and the `TestStreamEnvironment` to be 
used in combination with the changed `FlinkMiniCluster`.
    
    Most of the remaining changes adapt test cases to use the 
`EmbeddedHaServices` or to work with the changed `FlinkMiniCluster` 
implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink refactorZooKeeperServices

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3781
    
----

----


> ZooKeeper based high availability services should not close the underlying 
> CuratorFramework
> -------------------------------------------------------------------------------------------
>
>                 Key: FLINK-6078
>                 URL: https://issues.apache.org/jira/browse/FLINK-6078
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.3.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 1.3.0
>
>
> ZooKeeper based high availability tools like 
> {{ZooKeeperLeaderRetrievalService}} and {{ZooKeeperLeaderElectionService}} 
> expect that every instance of the services have a dedicated 
> {{CuratorFramework}} instance assigned. Thus, they also close this 
> {{CuratorFramework}} when the service is closed. This does not play well 
> along with the newly introduced {{HighAvailabilityServices}} which caches a 
> single {{CuratorFramework}} and shares it among all created services. In 
> order to make it work properly together I propose to change the behaviour 
> such that we no longer close the {{CuratorFramework}} clients in the 
> ZooKeeper based services.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to