bitflicker64 commented on PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#issuecomment-4007663614
**How I tested:**
1. Built a local Docker image from source with this fix applied
2. Brought up the 3-node cluster (3 PD + 3 Store + 3 Server) in bridge
network mode
3. Confirmed cluster was healthy with pd0 as initial leader
4. Restarted pd0 to force a new leader election — pd1 won
5. Checked partition distribution and cluster health with pd1 as leader
**Results with pd1 as leader:**
```
partitionCount:12 on all 3 stores ✅
leaderCount:12 on all 3 stores ✅
{"graphs":["hugegraph"]} ✅
All 9 containers healthy ✅
```
**Confirmed fallback triggered in pd1 logs:**
```
[WARN] RaftEngine - Failed to get leader gRPC address via RPC, falling back
to endpoint derivation
java.util.concurrent.ExecutionException:
com.alipay.remoting.exception.RemotingException:
Create connection failed. The address is 172.20.0.10:8610
at RaftEngine.getLeaderGrpcAddress(RaftEngine.java:247)
at PDService.redirectToLeader(PDService.java:1275)
```
**Before this fix:** RPC returns null → NPE → follower PDs can't redirect
requests to leader → cluster only worked when pd0 won leader election since it
never hit the broken code path.
**After this fix:** RPC failure caught with bounded timeout → fallback to
endpoint IP + gRPC port derivation → follower PDs correctly redirect to leader
regardless of which PD node wins election.
Related docker bridge networking PR: #2952
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]