bitflicker64 commented on PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#issuecomment-4007663614

   **How I tested:**
   
   1. Built a local Docker image from source with this fix applied
   2. Brought up the 3-node cluster (3 PD + 3 Store + 3 Server) in bridge 
network mode
   3. Confirmed cluster was healthy with pd0 as initial leader
   4. Restarted pd0 to force a new leader election — pd1 won
   5. Checked partition distribution and cluster health with pd1 as leader
   
   **Results with pd1 as leader:**
   ```
   partitionCount:12 on all 3 stores ✅
   leaderCount:12 on all 3 stores ✅
   {"graphs":["hugegraph"]} ✅
   All 9 containers healthy ✅
   ```
   
   **Confirmed fallback triggered in pd1 logs:**
   ```
   [WARN] RaftEngine - Failed to get leader gRPC address via RPC, falling back 
to endpoint derivation
   java.util.concurrent.ExecutionException: 
com.alipay.remoting.exception.RemotingException:
   Create connection failed. The address is 172.20.0.10:8610
       at RaftEngine.getLeaderGrpcAddress(RaftEngine.java:247)
       at PDService.redirectToLeader(PDService.java:1275)
   ```
   
   **Before this fix:** RPC returns null → NPE → follower PDs can't redirect 
requests to leader → cluster only worked when pd0 won leader election since it 
never hit the broken code path.
   
   **After this fix:** RPC failure caught with bounded timeout → fallback to 
endpoint IP + gRPC port derivation → follower PDs correctly redirect to leader 
regardless of which PD node wins election.
   
   Related docker bridge networking PR: #2952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to