codelipenghui opened a new pull request #6347: PIP 57: Improve Zookeeper 
Session Timeout Handling
URL: https://github.com/apache/pulsar/pull/6347
 
 
   Master Issue: #<xyz>
   
   ### Motivation
   
   In Pulsar, brokers use Zookeeper as the configuration store and broker 
metadata maintaining. We can also call them Global Zookeeper and Local 
Zookeeper. 
   
   The Global Zookeeper maintains the namespace policies, cluster metadata, and 
partitioned topic metadata. To reduce read operations on Zookeeper, each broker 
has a cache for global Zookeeper. The Global Zookeeper cache updates on znode 
changed. Currently, when the present session timeout happens on global 
Zookeeper, a new session starts. Broker does not create any EPHEMERAL znodes on 
global Zookeeper.
   
   The Local Zookeeper maintains the local cluster metadata, such as broker 
load data, topic ownership data, managed ledger metadata, and Bookie rack 
information. All of broker load data and topic ownership data are create 
EPHEMERAL nodes on Local Zookeeper. Currently, when session timeout happens on 
Local Zookeeper, the broker shutdown itself.
   
   Shutdown broker results in ownership change of topics that the broker owned. 
However, we encountered lots of problems related to the current session timeout 
handling. Such as broker with long JVM GC pause, Local Zookeeper under high 
workload. Especially the latter may cause all broker shutdowns. 
   
   So, the purpose of this proposal is to improve session timeout handling on 
Local Zookeeper to avoid unnecessary broker shutdown.
   
   ### Approach
   
   Same as the Global Zookeeper session timeout handling and Zookeeper session 
timeout handling in BookKeeper, a new session should start when the present 
session timeout. 
   
   If a new session failed to start, the broker would retry several times. The 
retry times depend on the configuration of the broker. After the number of 
retries, if still can't start session success, the broker still needs to be 
shut down since this may be a problem with the Zookeeper cluster. The user 
needs to restart the broker after the zookeeper cluster returns to normal. 
   
   If a new session starts success, the issue is slightly more complicated. So, 
I will introduce every scene separately.
   
   
   Topic ownership data handling
   
   The topic ownership data maintain all namespace bundles that owned by the 
broker. In Zookeeper, create an EPHEMERAL znode for each namespace bundle. When 
the session timeout happens on the local Zookeeper, all of the EPHEMERAL znode 
maintained by this broker will delete automatically. We need some mechanism to 
avoid the unnecessary ownership transfer of the bundles. Since the broker 
cached the owned bundles in memory, the broker can use the cache to re-own the 
bundles.
   
   Firstly, the broker should check if exists the znode for the bundle and the 
bundle owner is this broker. If the znode exists and the owner is this broker, 
it may be that znode has not been deleted. The broker should check if the 
ephemeral owner is the current session ID. If not, the broker should wait for 
the znode deletion.
   
   Then the broker tries to own the bundle. If the broker owns the bundle 
success means the bundle not owned by other brokers, the broker should check 
whether to preload the topics under the bundle. If the broker failed to own the 
bundle means the bundle owned by another broker. The broker should unload the 
bundle.
   
   Theoretically, the mechanism can guarantee that the ownership of most 
bundles will not change during the session timeout.
   
   Broker load data handling
   
   The load data used for namespace bundle load balancing, so there is no need 
to be overly complicated in handling. The only effect is that it will interfere 
with the choice of the broker when finding a candidate broker for a namespace 
bundle. Even without selecting the optimal broker, it will continue to relocate 
the namespace bundles.
   
   So for broker load data handling, we need to guarantee the load data of the 
broker can report success.
   
   Other scene handing
   
   There are also some usage scenarios of the local Zookeeper, BookKeeper 
client, managed ledger meta, bookie rack information, and schema metadata. All 
of these scenarios do not create any EPHEMERAL znodes on the Zookeeper. Pulsar 
introduces the Zookeeper cache for the local Zookeeper. The cache is 
invalidated when the session timeout occurs.
   
   ### Verifying this change
   
   The test is coming.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (no)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to