[
https://issues.apache.org/jira/browse/FLINK-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-22636:
----------------------------------
Description:
In order to better clean up Zookeeper HA services, I suggest grouping
job-specific services under a common {{jobs/<JobID>}} zNode. That way, it
becomes trivial to clean up the job-specific Zookeeper data (simply deleting
the {{jobs/<JobID>}} node.
Currently, our Zookeeper structure is not really structured well. The current
layout looks like this:
{code}
clusterID -> jobgraphs -> <job-id>
-> checkpoints -> <job-id> -> checkpoint-1
-> checkpoint-counter -> <job-id> -> counter
-> leaderlatch -> dispatcher_lock
-> resourc_emanager_lock
-> <job-id>
-> leader -> dispatcher_lock
-> resource_manager_lock
-> <job-id>
{code}
The new layout could look like this:
{code}
clusterID -> jobgraphs -> <job-id>
-> jobs -> <job-id> -> checkpoints -> checkpoint-1
-> checkpoint_id_counter -> counter
-> leader -> latch
-> connection_info
-> leader -> dispatcher -> latch
-> connection_info
-> resource_manager -> latch
-> connection_info
{code}
was:
In order to better clean up Zookeeper HA services, I suggest grouping
job-specific services under a common {{jobs/<JobID>}} zNode. That way, it
becomes trivial to clean up the job-specific Zookeeper data (simply deleting
the {{jobs/<JobID>}} node.
Currently, our Zookeeper structure is not really structured well. The current
layout looks like this:
{code}
clusterID -> jobgraphs -> <job-id>
-> checkpoints -> <job-id> -> checkpoint-1
-> checkpoint-counter -> <job-id> -> counter
-> leaderlatch -> dispatcher_lock
-> resourc_emanager_lock
-> <job-id>
-> leader -> dispatcher_lock
-> resource_manager_lock
-> <job-id>
{code}
The new layout could look like this:
{code}
clusterID -> jobgraphs -> <job-id>
-> jobs -> <job-id> -> checkpoints -> checkpoint-1
-> checkpoint_id_counter -> counter
-> leader -> latch
-> connection_info
-> leader -> dispatcher -> latch
-> connection_info
-> resource_manager -> latch
-> connection_info
{code}
> Group job specific ZooKeeper HA services under common jobs/<JobID> zNode
> ------------------------------------------------------------------------
>
> Key: FLINK-22636
> URL: https://issues.apache.org/jira/browse/FLINK-22636
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.13.0, 1.14.0, 1.12.3
> Reporter: Till Rohrmann
> Priority: Major
> Fix For: 1.14.0
>
>
> In order to better clean up Zookeeper HA services, I suggest grouping
> job-specific services under a common {{jobs/<JobID>}} zNode. That way, it
> becomes trivial to clean up the job-specific Zookeeper data (simply deleting
> the {{jobs/<JobID>}} node.
> Currently, our Zookeeper structure is not really structured well. The current
> layout looks like this:
> {code}
> clusterID -> jobgraphs -> <job-id>
> -> checkpoints -> <job-id> -> checkpoint-1
> -> checkpoint-counter -> <job-id> -> counter
> -> leaderlatch -> dispatcher_lock
> -> resourc_emanager_lock
> -> <job-id>
> -> leader -> dispatcher_lock
> -> resource_manager_lock
> -> <job-id>
> {code}
> The new layout could look like this:
> {code}
> clusterID -> jobgraphs -> <job-id>
> -> jobs -> <job-id> -> checkpoints -> checkpoint-1
> -> checkpoint_id_counter -> counter
> -> leader -> latch
> -> connection_info
> -> leader -> dispatcher -> latch
> -> connection_info
> -> resource_manager -> latch
> -> connection_info
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)