Re: [PR] Harden documentation link validation to prevent false CI passes [incubator-hugegraph-doc]

via GitHub Mon, 09 Feb 2026 05:35:11 -0800


imbajin commented on code in PR #452:
URL: 
https://github.com/apache/incubator-hugegraph-doc/pull/452#discussion_r2782667697



##########
dist/validate-links.sh:
##########
@@ -1,63 +1,132 @@
 #!/bin/bash
 
-# Configuration
 CONTENT_DIR="content"
 EXIT_CODE=0
 
+normalize_link() {
+    local link="$1"
+
+    link="${link%%#*}"
+    link="${link%%\?*}"
+
+    if [[ "$link" != "/" ]]; then
+        link="${link%/}"
+    fi
+
+    printf "%s" "$link"
+}
+
+check_internal_link() {
+    local link="$1"
+    local file="$2"
+    local line_no="$3"
+    local clean_link
+    local target_path
+
+    clean_link=$(normalize_link "$link")
+
+    [[ -z "$clean_link" || "$clean_link" == "#" ]] && return 0
+
+    if [[ "$clean_link" == "{{<"* || "$clean_link" == "{{%"* || "$clean_link" 
== "{{"* ]]; then
+        return 0
+    fi
+
+    local clean_link_lower="${clean_link,,}"
+
+    if [[ "$clean_link_lower" == http://* || "$clean_link_lower" == https://* 
|| "$clean_link_lower" == "//"* ]]; then
+        return 0
+    fi
+
+    case "$clean_link_lower" in
+        mailto:*|tel:*|javascript:*|data:*)
+            return 0
+            ;;
+    esac
+
+    if [[ "$clean_link" == /docs/* ]]; then
+        target_path="content/en${clean_link}"
+    elif [[ "$clean_link" == /cn/docs/* ]]; then
+        target_path="content${clean_link}"
+    elif [[ "$clean_link" == /* ]]; then
+        target_path="content/en${clean_link}"
+    else
+        local file_dir
+        file_dir=$(dirname "$file")
+        target_path="${file_dir}/${clean_link}"
+
+        while [[ "$target_path" == *"/./"* ]]; do
+            target_path="${target_path//\/.\//\/}"
+        done
+
+        while [[ "$target_path" =~ ([^/]+/\.\./?) ]]; do

Review Comment:
   ⚠️  path normalization looks incorrect and may silently produce wrong target 
paths.
   
   Current reduction:
   
   This removes the  substring but does not remove the preceding directory 
component the way real path canonicalization would.
   
   Suggestion: use a more reliable canonicalization approach (e.g. ) or  (if 
available), and ensure the result stays under .
   



##########
dist/validate-links.sh:
##########
@@ -1,63 +1,132 @@
 #!/bin/bash
 
-# Configuration
 CONTENT_DIR="content"
 EXIT_CODE=0
 
+normalize_link() {
+    local link="$1"
+
+    link="${link%%#*}"
+    link="${link%%\?*}"
+
+    if [[ "$link" != "/" ]]; then
+        link="${link%/}"
+    fi
+
+    printf "%s" "$link"
+}
+
+check_internal_link() {
+    local link="$1"
+    local file="$2"
+    local line_no="$3"
+    local clean_link
+    local target_path
+
+    clean_link=$(normalize_link "$link")
+
+    [[ -z "$clean_link" || "$clean_link" == "#" ]] && return 0
+
+    if [[ "$clean_link" == "{{<"* || "$clean_link" == "{{%"* || "$clean_link" 
== "{{"* ]]; then
+        return 0
+    fi
+
+    local clean_link_lower="${clean_link,,}"
+
+    if [[ "$clean_link_lower" == http://* || "$clean_link_lower" == https://* 
|| "$clean_link_lower" == "//"* ]]; then
+        return 0
+    fi
+
+    case "$clean_link_lower" in
+        mailto:*|tel:*|javascript:*|data:*)
+            return 0
+            ;;

Review Comment:
   ⚠️  should defensively handle missing/empty line numbers.
   
   In the new extraction pipeline,  comes from  matches. If anything changes 
(different grep, empty match, etc.),  could be empty and you’d print .
   
   Suggestion:
   
   or only print the  suffix when non-empty.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Harden documentation link validation to prevent false CI passes [incubator-hugegraph-doc]

Reply via email to