codeant-ai-for-open-source[bot] commented on code in PR #37345:
URL: https://github.com/apache/superset/pull/37345#discussion_r2714677644
##########
docs/scripts/generate-database-docs.mjs:
##########
@@ -675,6 +675,75 @@ function updateReadme(databases) {
return false;
}
+/**
+ * Extract custom_errors from engine specs for troubleshooting documentation
+ * Returns a map of module names to their custom errors
+ */
+function extractCustomErrors() {
+ console.log('Extracting custom_errors from engine specs...');
+
+ try {
+ const scriptPath = path.join(__dirname, 'extract_custom_errors.py');
+ const result = spawnSync('python3', [scriptPath], {
+ cwd: ROOT_DIR,
+ encoding: 'utf-8',
+ timeout: 30000,
+ maxBuffer: 10 * 1024 * 1024,
+ });
+
+ if (result.error) {
+ throw result.error;
+ }
+ if (result.status !== 0) {
+ throw new Error(result.stderr || 'Python script failed');
+ }
+
+ const customErrors = JSON.parse(result.stdout);
+ const moduleCount = Object.keys(customErrors).length;
+ const errorCount = Object.values(customErrors).reduce((sum, classes) =>
+ sum + Object.values(classes).reduce((s, errs) => s + errs.length, 0), 0);
+ console.log(` Found ${errorCount} custom errors across ${moduleCount}
modules`);
+ return customErrors;
+ } catch (err) {
+ console.log(' Could not extract custom_errors:', err.message);
+ return null;
+ }
+}
+
+/**
+ * Merge custom_errors into database documentation
+ * Maps by module name since that's how both datasets are keyed
+ */
+function mergeCustomErrors(databases, customErrors) {
+ if (!customErrors) return;
+
+ let mergedCount = 0;
+
+ for (const [, db] of Object.entries(databases)) {
+ const moduleName = db.module;
+ if (!moduleName || !customErrors[moduleName]) continue;
+
Review Comment:
**Suggestion:** When docs are generated via the full Flask context,
`db.module` contains the full Python module path (for example,
`superset.db_engine_specs.postgres`), but `extract_custom_errors.py` keys its
results by the file stem (for example, `postgres`), so indexing
`customErrors[moduleName]` fails and `custom_errors` are never merged, causing
the Troubleshooting section to be empty in that mode; normalize the module name
to its last segment before doing the lookup so both data sources line up.
[logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Troubleshooting section empty for Flask-mode docs.
- ⚠️ Developers miss synced error guidance.
- ⚠️ Docs preview doesn't reflect runtime error handling.
```
</details>
```suggestion
if (!moduleName) continue;
// Normalize module name to match extract_custom_errors.py keys
(basename)
const moduleKey = moduleName.split('.').pop();
if (!moduleKey || !customErrors[moduleKey]) continue;
// Get all errors from all classes in this module
const moduleErrors = customErrors[moduleKey];
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Run the docs generator in an environment where Flask context is available
so the script
chooses the full-run path: see tryRunFullScript() invocation at
docs/scripts/generate-database-docs.mjs:818 which assigns `databases =
tryRunFullScript();` (file: docs/scripts/generate-database-docs.mjs, line
~818). The
variable `databases` is therefore populated from the Flask-derived YAML/docs.
2. The script then calls extractCustomErrors() at
docs/scripts/generate-database-docs.mjs:841 which invokes the Python
extractor (file:
extract_custom_errors.py) and returns a map keyed by file stem (e.g.,
"postgres").
Confirmed call site: docs/scripts/generate-database-docs.mjs:841.
3. The merge loop at docs/scripts/generate-database-docs.mjs:722 iterates
databases and
reads module name from each db object via `const moduleName = db.module;`
(file:
docs/scripts/generate-database-docs.mjs, line 722). When databases were
produced by the
Flask path, db.module contains the full Python module path (e.g.,
"superset.db_engine_specs.postgres").
4. The current code does lookup using `customErrors[moduleName]`
(docs/scripts/generate-database-docs.mjs:723-724). Because the extractor
keys are file
stems like "postgres", the property lookup fails (no match) and the loop
continues without
merging. The result: db.documentation.custom_errors is not set and the
Troubleshooting
section remains empty for pages generated in Flask mode.
5. Reproduce locally: run docs/scripts/generate-database-docs.mjs in an
environment where
tryRunFullScript() succeeds (a development Superset install). Observe
console logs:
tryRunFullScript returns databases (line ~818) and then "Merged
custom_errors" log never
appears because mergeCustomErrors() finds no matches (loop at line 722).
Inspect one
generated database JSON entry in docs/src/data/databases.json: its
documentation lacks
custom_errors.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** docs/scripts/generate-database-docs.mjs
**Line:** 724:725
**Comment:**
*Logic Error: When docs are generated via the full Flask context,
`db.module` contains the full Python module path (for example,
`superset.db_engine_specs.postgres`), but `extract_custom_errors.py` keys its
results by the file stem (for example, `postgres`), so indexing
`customErrors[moduleName]` fails and `custom_errors` are never merged, causing
the Troubleshooting section to be empty in that mode; normalize the module name
to its last segment before doing the lookup so both data sources line up.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]