codeant-ai-for-open-source[bot] commented on code in PR #37345:
URL: https://github.com/apache/superset/pull/37345#discussion_r2714677644


##########
docs/scripts/generate-database-docs.mjs:
##########
@@ -675,6 +675,75 @@ function updateReadme(databases) {
   return false;
 }
 
+/**
+ * Extract custom_errors from engine specs for troubleshooting documentation
+ * Returns a map of module names to their custom errors
+ */
+function extractCustomErrors() {
+  console.log('Extracting custom_errors from engine specs...');
+
+  try {
+    const scriptPath = path.join(__dirname, 'extract_custom_errors.py');
+    const result = spawnSync('python3', [scriptPath], {
+      cwd: ROOT_DIR,
+      encoding: 'utf-8',
+      timeout: 30000,
+      maxBuffer: 10 * 1024 * 1024,
+    });
+
+    if (result.error) {
+      throw result.error;
+    }
+    if (result.status !== 0) {
+      throw new Error(result.stderr || 'Python script failed');
+    }
+
+    const customErrors = JSON.parse(result.stdout);
+    const moduleCount = Object.keys(customErrors).length;
+    const errorCount = Object.values(customErrors).reduce((sum, classes) =>
+      sum + Object.values(classes).reduce((s, errs) => s + errs.length, 0), 0);
+    console.log(`  Found ${errorCount} custom errors across ${moduleCount} 
modules`);
+    return customErrors;
+  } catch (err) {
+    console.log('  Could not extract custom_errors:', err.message);
+    return null;
+  }
+}
+
+/**
+ * Merge custom_errors into database documentation
+ * Maps by module name since that's how both datasets are keyed
+ */
+function mergeCustomErrors(databases, customErrors) {
+  if (!customErrors) return;
+
+  let mergedCount = 0;
+
+  for (const [, db] of Object.entries(databases)) {
+    const moduleName = db.module;
+    if (!moduleName || !customErrors[moduleName]) continue;
+

Review Comment:
   **Suggestion:** When docs are generated via the full Flask context, 
`db.module` contains the full Python module path (for example, 
`superset.db_engine_specs.postgres`), but `extract_custom_errors.py` keys its 
results by the file stem (for example, `postgres`), so indexing 
`customErrors[moduleName]` fails and `custom_errors` are never merged, causing 
the Troubleshooting section to be empty in that mode; normalize the module name 
to its last segment before doing the lookup so both data sources line up. 
[logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Troubleshooting section empty for Flask-mode docs.
   - ⚠️ Developers miss synced error guidance.
   - ⚠️ Docs preview doesn't reflect runtime error handling.
   ```
   </details>
   
   ```suggestion
       if (!moduleName) continue;
   
       // Normalize module name to match extract_custom_errors.py keys 
(basename)
       const moduleKey = moduleName.split('.').pop();
       if (!moduleKey || !customErrors[moduleKey]) continue;
   
       // Get all errors from all classes in this module
       const moduleErrors = customErrors[moduleKey];
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the docs generator in an environment where Flask context is available 
so the script
   chooses the full-run path: see tryRunFullScript() invocation at
   docs/scripts/generate-database-docs.mjs:818 which assigns `databases =
   tryRunFullScript();` (file: docs/scripts/generate-database-docs.mjs, line 
~818). The
   variable `databases` is therefore populated from the Flask-derived YAML/docs.
   
   2. The script then calls extractCustomErrors() at
   docs/scripts/generate-database-docs.mjs:841 which invokes the Python 
extractor (file:
   extract_custom_errors.py) and returns a map keyed by file stem (e.g., 
"postgres").
   Confirmed call site: docs/scripts/generate-database-docs.mjs:841.
   
   3. The merge loop at docs/scripts/generate-database-docs.mjs:722 iterates 
databases and
   reads module name from each db object via `const moduleName = db.module;` 
(file:
   docs/scripts/generate-database-docs.mjs, line 722). When databases were 
produced by the
   Flask path, db.module contains the full Python module path (e.g.,
   "superset.db_engine_specs.postgres").
   
   4. The current code does lookup using `customErrors[moduleName]`
   (docs/scripts/generate-database-docs.mjs:723-724). Because the extractor 
keys are file
   stems like "postgres", the property lookup fails (no match) and the loop 
continues without
   merging. The result: db.documentation.custom_errors is not set and the 
Troubleshooting
   section remains empty for pages generated in Flask mode.
   
   5. Reproduce locally: run docs/scripts/generate-database-docs.mjs in an 
environment where
   tryRunFullScript() succeeds (a development Superset install). Observe 
console logs:
   tryRunFullScript returns databases (line ~818) and then "Merged 
custom_errors" log never
   appears because mergeCustomErrors() finds no matches (loop at line 722). 
Inspect one
   generated database JSON entry in docs/src/data/databases.json: its 
documentation lacks
   custom_errors.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docs/scripts/generate-database-docs.mjs
   **Line:** 724:725
   **Comment:**
        *Logic Error: When docs are generated via the full Flask context, 
`db.module` contains the full Python module path (for example, 
`superset.db_engine_specs.postgres`), but `extract_custom_errors.py` keys its 
results by the file stem (for example, `postgres`), so indexing 
`customErrors[moduleName]` fails and `custom_errors` are never merged, causing 
the Troubleshooting section to be empty in that mode; normalize the module name 
to its last segment before doing the lookup so both data sources line up.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to