errose28 commented on code in PR #297:
URL: https://github.com/apache/ozone-site/pull/297#discussion_r2742930178


##########
docs/08-developer-guide/02-run/02-docker-compose.md:
##########
@@ -20,7 +20,7 @@ Before you begin, ensure you have installed:
 
 ### Step 1: Build from Source
 
-First, build Ozone from source following our [Build with 
Maven](/docs/08-developer-guide/01-build/01-maven.md) guide.
+First, build Ozone from source following our [Build with 
Maven](/docs/developer-guide/build/maven) guide.

Review Comment:
   Absolute paths will break when we add versioning to the docs. Can we make 
the check also force relative paths? I think this would simply mean the path 
cannot start with `/`.



##########
docusaurus.config.js:
##########
@@ -112,6 +112,59 @@ const config = {
 
       return result;
     },
+    /*
+    Validate internal markdown links to ensure they don't contain number 
prefixes or file extensions.
+    These can break when the ordering or format of the target page is updated.
+    Docusaurus can resolve links without these.
+    See https://docusaurus.io/docs/api/docusaurus-config#markdown for 
reference.
+    */
+    preprocessor: (/** @type {{filePath: string, fileContent: string}} */ 
params) => {
+      const {filePath, fileContent} = params;
+
+      // Validate internal links format
+      const internalLinkPattern = /\[([^\]]+)\]\(([^)]+\.md(?:#[^)]+)?)\)/g;

Review Comment:
   This is still allowing `.mdx` files. We should be able to validate that 
there is no file extension on the link, regardless of what that file extension 
is.



##########
docusaurus.config.js:
##########
@@ -112,6 +112,59 @@ const config = {
 
       return result;
     },
+    /*
+    Validate internal markdown links to ensure they don't contain number 
prefixes or file extensions.
+    These can break when the ordering or format of the target page is updated.
+    Docusaurus can resolve links without these.
+    See https://docusaurus.io/docs/api/docusaurus-config#markdown for 
reference.
+    */
+    preprocessor: (/** @type {{filePath: string, fileContent: string}} */ 
params) => {
+      const {filePath, fileContent} = params;
+
+      // Validate internal links format
+      const internalLinkPattern = /\[([^\]]+)\]\(([^)]+\.md(?:#[^)]+)?)\)/g;
+      const numberPrefixPattern = /\/\d{2}-[^/]+/;
+
+      let matches;
+      const invalidLinks = [];
+
+      while ((matches = internalLinkPattern.exec(fileContent)) !== null) {
+        const linkText = matches[1];
+        const linkPath = matches[2];
+
+        // Skip external links (http://, https://, mailto:, etc.)
+        if (/^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(linkPath)) {
+          continue;
+        }
+
+        // Skip absolute paths from site root (starting with /)
+        if (linkPath.startsWith('/')) {
+          continue;
+        }
+
+        // Check for number prefixes or .md extensions
+        if (numberPrefixPattern.test(linkPath) || linkPath.includes('.md')) {
+          invalidLinks.push({
+            text: linkText,
+            path: linkPath,
+            line: fileContent.substring(0, matches.index).split('\n').length
+          });
+        }
+      }
+
+      if (invalidLinks.length > 0) {
+        const errorMsg = invalidLinks.map(link =>
+          `  Line ${link.line}: [${link.text}](${link.path})`
+        ).join('\n');
+
+        console.error('Invalid internal links found in', filePath + ':\n' + 
errorMsg);
+        console.error('\nInternal links should not include number prefixes or 
.md extensions.');

Review Comment:
   ```suggestion
           console.error('\nInternal links should not include number prefixes 
or file extensions.');
   ```



##########
docusaurus.config.js:
##########
@@ -112,6 +112,59 @@ const config = {
 
       return result;
     },
+    /*
+    Validate internal markdown links to ensure they don't contain number 
prefixes or file extensions.
+    These can break when the ordering or format of the target page is updated.
+    Docusaurus can resolve links without these.
+    See https://docusaurus.io/docs/api/docusaurus-config#markdown for 
reference.
+    */
+    preprocessor: (/** @type {{filePath: string, fileContent: string}} */ 
params) => {
+      const {filePath, fileContent} = params;
+
+      // Validate internal links format
+      const internalLinkPattern = /\[([^\]]+)\]\(([^)]+\.md(?:#[^)]+)?)\)/g;
+      const numberPrefixPattern = /\/\d{2}-[^/]+/;
+
+      let matches;
+      const invalidLinks = [];
+
+      while ((matches = internalLinkPattern.exec(fileContent)) !== null) {
+        const linkText = matches[1];
+        const linkPath = matches[2];
+
+        // Skip external links (http://, https://, mailto:, etc.)
+        if (/^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(linkPath)) {
+          continue;
+        }
+
+        // Skip absolute paths from site root (starting with /)
+        if (linkPath.startsWith('/')) {
+          continue;
+        }
+
+        // Check for number prefixes or .md extensions
+        if (numberPrefixPattern.test(linkPath) || linkPath.includes('.md')) {

Review Comment:
   There's already a file extension check in the `internalLinkPattern` regex. 
Do we need both?



##########
docusaurus.config.js:
##########
@@ -112,6 +112,59 @@ const config = {
 
       return result;
     },
+    /*
+    Validate internal markdown links to ensure they don't contain number 
prefixes or file extensions.
+    These can break when the ordering or format of the target page is updated.
+    Docusaurus can resolve links without these.
+    See https://docusaurus.io/docs/api/docusaurus-config#markdown for 
reference.
+    */
+    preprocessor: (/** @type {{filePath: string, fileContent: string}} */ 
params) => {
+      const {filePath, fileContent} = params;
+
+      // Validate internal links format
+      const internalLinkPattern = /\[([^\]]+)\]\(([^)]+\.md(?:#[^)]+)?)\)/g;
+      const numberPrefixPattern = /\/\d{2}-[^/]+/;

Review Comment:
   It is also possible for the relative path to not start with `./`, just like 
regular filesystem paths. In this case the number prefix will be the first two 
characters of the string but still needs to be detected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to