bitflicker64 opened a new pull request, #2961:
URL: https://github.com/apache/hugegraph/pull/2961

   
   
   Here's the filled-out PR template:
   
   ---
   
   ## Purpose of the PR
   
   - close #2959 
   
   In a 3-node PD cluster running in Docker bridge network mode, 
`getLeaderGrpcAddress()` makes a bolt RPC call to discover the leader's gRPC 
address when the current node is a follower. This call fails in bridge mode — 
the TCP connection establishes but the bolt RPC response never returns 
properly, causing `CompletableFuture.get()` to return null and throw NPE.
   
   This causes:
   1. `redirectToLeader()` fails with NPE
   2. Store registration requests landing on follower PDs are never forwarded
   3. Stores register but partitions are never distributed (`partitionCount:0`)
   4. HugeGraph servers stuck in `DEADLINE_EXCEEDED` loop indefinitely
   
   The cluster only works when pd0 wins raft leader election (since 
`isLeader()` returns true and the broken code path is skipped). If pd1 or pd2 
wins, the NPE fires on every redirect attempt.
   
   Related PR: #2952
   
   ## Main Changes
   
   - Add a bounded timeout to the bolt RPC call using `config.getRpcTimeout()` 
instead of unbounded `.get()`
   - Add null-check on the RPC response before accessing `.getGrpcAddress()`
   - Fall back to deriving the leader address from the raft endpoint IP + local 
gRPC port when the RPC fails or times out
   - Add `TimeUnit` and `TimeoutException` imports
   
   ## Verifying these changes
   
   - [ ] Trivial rework / code cleanup without any test coverage. (No Need)
   - [ ] Already covered by existing tests, such as *(please modify tests 
here)*.
   - [x] Need tests and can be verified as follows:
       - Deploy 3-node PD cluster in Docker bridge network mode
       - Verify cluster works regardless of which PD node wins raft leader 
election
       - Confirm stores show `partitionCount:12` on all 3 nodes when pd1 or pd2 
is leader
       - Confirm no NPE in pd logs at `getLeaderGrpcAddress`
   
   ## Does this PR potentially affect the following parts?
   
   - [ ]  Dependencies ([add/update 
license](https://hugegraph.apache.org/docs/contribution-guidelines/contribute/#321-check-licenses)
 info & 
[regenerate_known_dependencies.sh](../install-dist/scripts/dependency/regenerate_known_dependencies.sh))
   - [ ]  Modify configurations
   - [ ]  The public API
   - [ ]  Other affects (typed here)
   - [x]  Nope
   
   ## Documentation Status
   
   - [ ]  `Doc - TODO`
   - [ ]  `Doc - Done`
   - [x]  `Doc - No Need`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to