bitflicker64 commented on code in PR #2952:
URL: 
https://github.com/apache/incubator-hugegraph/pull/2952#discussion_r2844649965


##########
hugegraph-server/hugegraph-dist/src/assembly/static/bin/wait-storage.sh:
##########
@@ -70,7 +70,28 @@ done < <(env | sort -r | awk -F= '{ st = index($0, "="); 
print $1 " " substr($0,
 # wait for storage
 if env | grep '^hugegraph\.' > /dev/null; then
     if [ -n "${WAIT_STORAGE_TIMEOUT_S:-}" ]; then
-        timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c \
-        "until bin/gremlin-console.sh -- -e $DETECT_STORAGE > /dev/null 2>&1; 
do echo \"Hugegraph server are waiting for storage backend...\"; sleep 5; done"
+        # Extract pd.peers from config or environment
+        PD_PEERS="${hugegraph_pd_peers:-}"
+        if [ -z "$PD_PEERS" ]; then
+            PD_PEERS=$(grep -E "^\s*pd\.peers\s*=" "$GRAPH_CONF" | sed 
's/.*=\s*//' | tr -d ' ')
+        fi
+
+        if [ -n "$PD_PEERS" ]; then
+            # Convert gRPC address to REST address (8686 -> 8620)
+            PD_REST=$(echo "$PD_PEERS" | sed 's/:8686/:8620/g' | cut -d',' -f1)
+            echo "Waiting for PD REST endpoint at $PD_REST..."
+
+            timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c "
+                until curl -fsS http://${PD_REST}/v1/health >/dev/null 2>&1; do
+                    echo 'Hugegraph server are waiting for storage backend...'
+                    sleep 5
+                done
+                echo 'PD is reachable, waiting extra 10s for store 
registration...'
+                sleep 10
+                echo 'Storage backend is ready!'
+            " || echo "Warning: Timeout waiting for storage, proceeding 
anyway..."
+        else
+            echo "No pd.peers configured, skipping storage wait..."
+        fi

Review Comment:
   Partition-based readiness checks may be problematic because partition 
assignment happens asynchronously after `wait-storage` completes, and a 
correctly registered Store can legitimately report `partitionCount = 0` during 
normal initialization. Treating this as a failure condition risks blocking 
startup in healthy clusters rather than detecting real backend issues. In this 
case, would it be more appropriate to split validation into pre- and 
post-startup checks (e.g., Store/PD availability first, partition stabilization 
later)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to