bitflicker64 commented on code in PR #2952:
URL: 
https://github.com/apache/incubator-hugegraph/pull/2952#discussion_r2844649965


##########
hugegraph-server/hugegraph-dist/src/assembly/static/bin/wait-storage.sh:
##########
@@ -70,7 +70,28 @@ done < <(env | sort -r | awk -F= '{ st = index($0, "="); 
print $1 " " substr($0,
 # wait for storage
 if env | grep '^hugegraph\.' > /dev/null; then
     if [ -n "${WAIT_STORAGE_TIMEOUT_S:-}" ]; then
-        timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c \
-        "until bin/gremlin-console.sh -- -e $DETECT_STORAGE > /dev/null 2>&1; 
do echo \"Hugegraph server are waiting for storage backend...\"; sleep 5; done"
+        # Extract pd.peers from config or environment
+        PD_PEERS="${hugegraph_pd_peers:-}"
+        if [ -z "$PD_PEERS" ]; then
+            PD_PEERS=$(grep -E "^\s*pd\.peers\s*=" "$GRAPH_CONF" | sed 
's/.*=\s*//' | tr -d ' ')
+        fi
+
+        if [ -n "$PD_PEERS" ]; then
+            # Convert gRPC address to REST address (8686 -> 8620)
+            PD_REST=$(echo "$PD_PEERS" | sed 's/:8686/:8620/g' | cut -d',' -f1)
+            echo "Waiting for PD REST endpoint at $PD_REST..."
+
+            timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c "
+                until curl -fsS http://${PD_REST}/v1/health >/dev/null 2>&1; do
+                    echo 'Hugegraph server are waiting for storage backend...'
+                    sleep 5
+                done
+                echo 'PD is reachable, waiting extra 10s for store 
registration...'
+                sleep 10
+                echo 'Storage backend is ready!'
+            " || echo "Warning: Timeout waiting for storage, proceeding 
anyway..."
+        else
+            echo "No pd.peers configured, skipping storage wait..."
+        fi

Review Comment:
   Partition-based readiness checks may be unreliable since partition 
assignment occurs asynchronously after wait-storage completes, and a properly 
registered Store can legitimately report partitionCount = 0 during normal 
initialization. Interpreting this as a failure condition could unintentionally 
block startup in otherwise healthy clusters. Would it make sense to consider 
separating validation into pre-startup checks (Store/PD availability) and 
post-startup checks (partition stabilization) instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to