poorbarcode opened a new pull request, #4128:
URL: https://github.com/apache/bookkeeper/pull/4128

   ### Motivation
   
   ### Background
   - `NetworkTopologyImpl` does not allow multi Bks register with different 
`depthOfAllLeaves.`
   - Regarding `NetworkTopologyImpl,` once a node is added, the 
`depthOfAllLeaves` will be initiated. 
   - When a new BK is registering:
     -  `EnsemblePlacementPolicy.networkTopology.add( newBkNode )`
     - `EnsemblePlacementPolicy.knownBookies.add( newBkNode )`
     - Once `network topology.add(newBkNode)` fail, it will not call 
`knownBookies.add( newBkNode )`
   
   ---
   
   ### The scenarios that would hit bugs.
   
   #### Scenario 1: Reset racks for all BK nodes, for Example: 
   - Register `[BK1,BK2]` with rack `/r_1`
     - `depthOfAllLeaves` is `2` now.
   - Restart `[BK1,BK2]` with rack `/region_1/r_1`
     - the depth of  `/region_1/r_1` is `3`, different with `2`
     - Since `depthOfAllLeaves` of `NetworkTopologyImpl` is still `2`, `[BK1, 
BK2]` could not be added, and get an error `can't add leaf node BK1 at depth 3 
to the topology.`
   
   You can reproduce this by the new test `testRestartBKWithNewRackDepth.`
   
   ---
   
   #### Scenario 2: A race condition caused `depthOfAllLeaves` to be 
initialized with a wrong value.
   
   | Time | BK1 start/shutdown | thread: `ZK main` | worker thread of 
`RegistrationClient`|
   | --- |  --- | --- | --- |
   | 1 | `BK1` start |
   | 2 | | Add a node into `bkInfos` of `RegistrationClient` |
   | 3 | | | Notice `EnsemblePlacementPolicy`: a node added |
   | 4 | `BK1` shutdown |
   | 5 | | Remove the node from `bkInfos` of `RegistrationClient,` `bkInfos` is 
empty now |
   | 6 | | | **(Highlight)** `NetworkTopologyImpl` tries to calculate the 
network location but gets `null,` so use a default value 
`default-region/default-rack`<sup>[1]</sup> |
   | 6 | | | `depthOfAllLeaves` was initialized to `3` |
   | 7 | The node `BK1` could not be added to `NetworkTopologyImpl` anymore |
   ---
   
   **[1]**: 
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/TopologyAwareEnsemblePlacementPolicy.java#L830
   
   ```java
   protected String resolveNetworkLocation(BookieId addr) {
     try {
         return NetUtils.resolveNetworkLocation(dnsResolver, 
bookieAddressResolver.resolve(addr));
     } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
         return NetworkTopology.DEFAULT_REGION_AND_RACK; // 
"/default-region/default-rack"
     }
   }
   ```
   
   ---
   
   The above scenario will cause `EnsemblePlacementPolicy.knownBookies` to be 
an empty collection, leading to error `Not enough non-faulty bookies available`
   <img width="1424" alt="Screenshot 2023-11-09 at 23 49 17" 
src="https://github.com/apache/bookkeeper/assets/25195800/3cc9bf3d-bee2-4bfa-ada7-934526377b40";>
   
   <img width="1122" alt="Screenshot 2023-11-09 at 23 50 09" 
src="https://github.com/apache/bookkeeper/assets/25195800/1d59d45b-e14b-4e46-b7da-6a07f8928772";>
   
   
   ### Changes
   
   Reset `depthOfAllLeaves` after all BKs have been removed once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to