Re: [PR] fix(docker): migrate single-node compose from host to bridge networking [incubator-hugegraph]

via GitHub Sun, 22 Feb 2026 03:54:17 -0800


imbajin commented on code in PR #2952:
URL: 
https://github.com/apache/incubator-hugegraph/pull/2952#discussion_r2837673519



##########
hugegraph-server/hugegraph-dist/src/assembly/static/bin/wait-storage.sh:
##########
@@ -70,7 +70,28 @@ done < <(env | sort -r | awk -F= '{ st = index($0, "="); 
print $1 " " substr($0,
 # wait for storage
 if env | grep '^hugegraph\.' > /dev/null; then
     if [ -n "${WAIT_STORAGE_TIMEOUT_S:-}" ]; then
-        timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c \
-        "until bin/gremlin-console.sh -- -e $DETECT_STORAGE > /dev/null 2>&1; 
do echo \"Hugegraph server are waiting for storage backend...\"; sleep 5; done"
+        # Extract pd.peers from config or environment
+        PD_PEERS="${hugegraph_pd_peers:-}"
+        if [ -z "$PD_PEERS" ]; then
+            PD_PEERS=$(grep -E "^\s*pd\.peers\s*=" "$GRAPH_CONF" | sed 
's/.*=\s*//' | tr -d ' ')
+        fi
+
+        if [ -n "$PD_PEERS" ]; then
+            # Convert gRPC address to REST address (8686 -> 8620)
+            PD_REST=$(echo "$PD_PEERS" | sed 's/:8686/:8620/g' | cut -d',' -f1)
+            echo "Waiting for PD REST endpoint at $PD_REST..."
+
+            timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c "
+                until curl -fsS http://${PD_REST}/v1/health >/dev/null 2>&1; do
+                    echo 'Hugegraph server are waiting for storage backend...'
+                    sleep 5
+                done
+                echo 'PD is reachable, waiting extra 10s for store 
registration...'
+                sleep 10
+                echo 'Storage backend is ready!'
+            " || echo "Warning: Timeout waiting for storage, proceeding 
anyway..."
+        else
+            echo "No pd.peers configured, skipping storage wait..."
+        fi

Review Comment:
   ‼️ **Critical: wait-storage check downgraded from actual graph connectivity 
test to PD health ping only**
   
   The PR deletes `detect-storage.groovy` and replaces the Gremlin-based 
storage readiness check with a simple `curl http://${PD_REST}/v1/health` + a 
fixed 10s sleep. This is a significant regression in correctness:
   
   - PD being healthy **does not** mean: Store has registered, partitions are 
assigned, or the server can actually read/write the backend
   - The old check used the Gremlin console to open the actual graph — a true 
end-to-end proof that storage is ready
   - On timeout the new code silently proceeds: `echo "Warning: Timeout waiting 
for storage, proceeding anyway..."` — the server can start in a broken state 
with no hard failure
   
   Suggested approach: restore a functional storage readiness probe. At 
minimum, poll the store's partition/leader endpoint (e.g. `/v1/health` or 
`/v1/partitions`) and verify non-zero partition/leader count before proceeding. 
Fail hard on timeout rather than warning-and-continue.
   
   ```suggestion
           timeout "${WAIT_STORAGE_TIMEOUT_S}s" bash -c "
               until curl -fsS http://${PD_REST}/v1/health >/dev/null 2>&1 &&
                     [ \"$(curl -fsS http://${STORE_REST}/v1/partitions 
2>/dev/null | python3 -c 'import sys,json; d=json.load(sys.stdin); 
print(d.get(\"partitionCount\",0))' 2>/dev/null)\" -gt 0 ]; do
                   echo 'Waiting for storage backend (PD + Store partitions)...'
                   sleep 5
               done
               echo 'Storage backend is ready\!'
           " || { echo 'ERROR: Timeout waiting for storage backend'; exit 1; }
   ```



##########
docker/docker-compose.yml:
##########
@@ -15,44 +15,98 @@
 # limitations under the License.
 #
 
-version: "3"
+name: hugegraph-single
+
+networks:
+  hg-net:
+    driver: bridge
+
+volumes:
+  hg-pd-data:
+  hg-store-data:
 
 services:
   pd:
-    image: hugegraph/pd
-    container_name: pd
+    build:
+      context: ..
+      dockerfile: hugegraph-pd/Dockerfile
+    image: hugegraph/pd:${HUGEGRAPH_VERSION:-1.7.0}
+    container_name: hg-pd
     hostname: pd
-    network_mode: host
+    restart: unless-stopped
+    networks: [hg-net]
+    environment:
+      HG_PD_GRPC_HOST: pd
+      HG_PD_GRPC_PORT: "8686"
+      HG_PD_REST_PORT: "8620"
+      HG_PD_RAFT_ADDRESS: pd:8610
+      HG_PD_RAFT_PEERS_LIST: pd:8610
+      HG_PD_INITIAL_STORE_LIST: store:8500
+      HG_PD_DATA_PATH: /hugegraph-pd/pd_data
+    ports:
+      - "8620:8620"

Review Comment:
   ‼️ **Internal cluster ports (gRPC/Raft) should not be published to the host**
   
   The compose file now publishes several internal ports to the host that 
should only be accessible between containers:
   
   - `8686:8686` — PD gRPC (internal)
   - `8500:8500` — Store gRPC (internal)  
   - `8510:8510` — Store Raft (internal)
   
   These are cluster-internal communication ports. Exposing them on the host:
   1. Unnecessarily expands the attack surface
   2. Can conflict with other services on the host
   3. Creates confusion for users about what's "public" vs "internal"
   
   Only the REST/management ports need to be reachable from the host (`8620`, 
`8520`, `8080`). The gRPC/Raft ports should be removed from the `ports` section 
— they're already reachable between containers via the `hg-net` bridge network.
   
   ```suggestion
       ports:
         - "8620:8620"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(docker): migrate single-node compose from host to bridge networking [incubator-hugegraph]

Reply via email to