codeant-ai-for-open-source[bot] commented on code in PR #37070:
URL: https://github.com/apache/superset/pull/37070#discussion_r2684859440
##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+ - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
Review Comment:
**Suggestion:** Host/header-based tenant discovery note is unsafe as
written: relying solely on Host header or simple subdomain parsing
("{tenant}.mydomain.com") without strict validation allows host-header
manipulation and tenant spoofing—recommend enforcing canonical host checks,
validating against a tenant registry, and rejecting unknown/forged Host
headers. [security]
**Severity Level:** Critical 🚨
```suggestion
- Chosen discovery pattern (subdomain as "{tenant}.mydomain.com") —
enforce canonical Host header validation and map hosts to tenant entries from
the Tenants registry; never trust raw Host header without normalization and
allowlist checks to prevent host-header injection or tenant spoofing
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The doc currently suggests simple subdomain parsing; without canonical host
validation and a tenancy registry this is vulnerable
to Host header manipulation. The improved wording calls out concrete
mitigations and is relevant to the plan.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/superset-multi-tenancy.mdc
**Line:** 10:10
**Comment:**
*Security: Host/header-based tenant discovery note is unsafe as
written: relying solely on Host header or simple subdomain parsing
("{tenant}.mydomain.com") without strict validation allows host-header
manipulation and tenant spoofing—recommend enforcing canonical host checks,
validating against a tenant registry, and rejecting unknown/forged Host headers.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
.cursor/rules/apache-superset-architect.mdc:
##########
@@ -0,0 +1,36 @@
+---
+description: Apache Superset Architect and Designer for Cursor IDE
+globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql",
"**/*.md"]
+alwaysApply: true
+---
+
+# Software Architect – Apache Superset Extension
+
+ROLE
+Act as a senior Software Architect designing an extension for Apache Superset.
Respect Superset’s extension surfaces (viz plugins via @superset-ui, DB Engine
Specs) and community contribution protocols. Avoid core patches unless a SIP is
warranted.
+
+OBJECTIVE
+Design a maintainable, future-proof extension that stays compatible with
upstream changes and follows Superset’s coding, testing, and review standards.
+
+DELIVERABLE FORMAT
+1) Context & objectives
+2) Architecture & diagrams (components, interfaces, data flow)
+3) Design principles & patterns (SOLID/DRY; Strategy/Adapter/Factory/Facade)
+4) Contribution plan (SIP if needed; PR titles/labels; artifacts/screenshots)
+5) Testing & CI/CD (pytest/Jest/Playwright; coverage; pre-commit; GH Actions)
+6) Compatibility & versioning strategy (avoid private APIs; feature flags as
needed)
+7) Security & observability (logging/metrics; ASF vuln norms)
+8) Risks & mitigations
+
+SUPERESET-SPECIFIC GUARDRAILS
Review Comment:
**Suggestion:** Typo in the section header: "SUPERESET-SPECIFIC GUARDRAILS"
is misspelled and should be "SUPERSET-SPECIFIC GUARDRAILS"; this will cause
confusing search/selection by readers and any automated lookup by exact header
text to fail. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
SUPERSET-SPECIFIC GUARDRAILS
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
There's a real typo in the file ("SUPERESET" vs "SUPERSET"/"Superset")
visible in the final state; fixing it improves readability and prevents
exact-text lookup failures. It's a small, correctable bug.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/apache-superset-architect.mdc
**Line:** 25:25
**Comment:**
*Possible Bug: Typo in the section header: "SUPERESET-SPECIFIC
GUARDRAILS" is misspelled and should be "SUPERSET-SPECIFIC GUARDRAILS"; this
will cause confusing search/selection by readers and any automated lookup by
exact header text to fail.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/multitenancy/commands/seed_tenants.py:
##########
@@ -0,0 +1,190 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Command to seed tenants table with development data"""
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+import click
+from flask.cli import with_appcontext
+
+from superset import db
+from superset.multitenancy.factories.tenant_factory import TenantFactory
+from superset.multitenancy.models.tenant import Tenant
+from superset.multitenancy.repositories.tenant_repository import
TenantRepository
+
+logger = logging.getLogger(__name__)
Review Comment:
**Suggestion:** Unused logger: `logger` is created but never used; either
use `logger` for messages (recommended) or remove the instantiation to avoid
unused-variable warnings and confusion. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
# logger = logging.getLogger(__name__) # removed unused logger (use logging
or click.echo consistently)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The logger is instantiated and never referenced. Either remove it or use it
for informational/error logs — both options are valid and improve code clarity.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/multitenancy/commands/seed_tenants.py
**Line:** 32:32
**Comment:**
*Possible Bug: Unused logger: `logger` is created but never used;
either use `logger` for messages (recommended) or remove the instantiation to
avoid unused-variable warnings and confusion.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+ - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+ - Tenants table (columns: slug, oauth_issuer, client_id, client_secret,
scopes, db_sqlalchemy_uri, default_schema)
Review Comment:
**Suggestion:** Security risk: the plan lists `client_secret` as a column in
the Tenants table without any guidance to protect it; storing raw client
secrets in the application database can expose credentials if the DB is
compromised—recommend removing raw secret storage or explicitly calling out
using a secrets manager or encryption-at-rest with strict access controls.
[security]
**Severity Level:** Critical 🚨
```suggestion
- Tenants table (columns: slug, oauth_issuer, client_id, scopes,
db_sqlalchemy_uri, default_schema) — do NOT store raw `client_secret` in
application DB; store secrets in a dedicated secrets manager (e.g., Vault, AWS
Secrets Manager) or encrypt them with a key management service and document
access control/rotation procedures
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a real security concern: the PR's doc explicitly includes
client_secret in the tenants table.
Removing raw secrets or mandating a secrets manager/KMS mitigates credential
exposure and
credential-rotation risks. The suggested change directly improves security
posture.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/superset-multi-tenancy.mdc
**Line:** 11:11
**Comment:**
*Security: Security risk: the plan lists `client_secret` as a column in
the Tenants table without any guidance to protect it; storing raw client
secrets in the application database can expose credentials if the DB is
compromised—recommend removing raw secret storage or explicitly calling out
using a secrets manager or encryption-at-rest with strict access controls.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+ - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+ - Tenants table (columns: slug, oauth_issuer, client_id, client_secret,
scopes, db_sqlalchemy_uri, default_schema)
+ - Request flow: incoming URL → lookup → set g.tenant → pick OAuth provider
→ authorize
+
+2) auth-design.md
+ - CustomSecurityManager responsibilities:
+ - Select OAuth provider dynamically from Tenants
+ - oauth_user_info mapping (claims→username/email/roles)
+ - Session/CSRF handling for API calls
+ - Fallback/error UX if tenant not found
+
+3) provisioning.md
+ - API calls for:
+ - POST /api/v1/database (create tenant connection)
Review Comment:
**Suggestion:** Resource exhaustion/design risk: allowing POST
/api/v1/database to create tenant connections without provisioning limits or
pooling guidance can exhaust DB connection pools and resources; recommend
documenting quotas, validation, lazy provisioning, and connection-pooling
limits. [resource leak]
**Severity Level:** Minor ⚠️
```suggestion
- POST /api/v1/database (create tenant connection) — document and
enforce quotas, input validation, lazy provisioning, and connection-pooling
limits; validate DB URIs, require admin auth, and ensure creation cannot
exhaust global connection pools
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Allowing unaudited creation of DB connections per-tenant is an operational
risk. The suggested additions (quotas, validation, lazy provisioning, pool
limits) address a real resource-exhaustion vector.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/superset-multi-tenancy.mdc
**Line:** 23:23
**Comment:**
*Resource Leak: Resource exhaustion/design risk: allowing POST
/api/v1/database to create tenant connections without provisioning limits or
pooling guidance can exhaust DB connection pools and resources; recommend
documenting quotas, validation, lazy provisioning, and connection-pooling
limits.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
.cursor/rules/apache-superset-architect.mdc:
##########
@@ -0,0 +1,36 @@
+---
+description: Apache Superset Architect and Designer for Cursor IDE
+globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql",
"**/*.md"]
Review Comment:
**Suggestion:** Configuration bug in the frontmatter `globs` list: the
rule's own file extension (`.mdc`) is not included, so the rule will not match
`.mdc` files (including itself) — add `"**/*.mdc"` to the glob list to ensure
the rule applies to .mdc files. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql",
"**/*.md", "**/*.mdc"]
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The rule's frontmatter currently omits "*.mdc", so the rule may not match
.mdc files (including itself). Adding "**/*.mdc" is a practical fix to ensure
the rule applies where intended.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/apache-superset-architect.mdc
**Line:** 3:3
**Comment:**
*Logic Error: Configuration bug in the frontmatter `globs` list: the
rule's own file extension (`.mdc`) is not included, so the rule will not match
`.mdc` files (including itself) — add `"**/*.mdc"` to the glob list to ensure
the rule applies to .mdc files.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset-frontend/webpack.config.js:
##########
@@ -62,9 +62,26 @@ const {
// Precedence: CLI args > env vars > defaults
const devserverPort = cliPort || process.env.WEBPACK_DEVSERVER_PORT || 9000;
+// Default to 127.0.0.1 for security (localhost only)
+// For multitenancy subdomain support, override via
WEBPACK_DEVSERVER_HOST=0.0.0.0
+// Can be overridden via CLI or WEBPACK_DEVSERVER_HOST env var
const devserverHost =
cliHost || process.env.WEBPACK_DEVSERVER_HOST || '127.0.0.1';
+// Allowed hosts for webpack dev server
+// Default: localhost and .localhost for security
+// For multitenancy: Set WEBPACK_DEVSERVER_ALLOWED_HOSTS='all' or specific
domains
+// Examples:
+// - 'all' (allow any host - use with caution)
+// - '.analytics.local' (allow all subdomains of analytics.local)
+// - ['localhost', '.localhost', '.analytics.local'] (specific list)
+const devserverAllowedHosts =
+ process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS
+ ? process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS === 'all'
+ ? 'all'
+ : process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS.split(',').map((h) =>
h.trim())
Review Comment:
**Suggestion:** Splitting WEBPACK_DEVSERVER_ALLOWED_HOSTS on commas without
filtering can produce empty strings (e.g. trailing comma), resulting in an
empty host entry being passed to webpack's allowedHosts; filter out falsy/empty
entries after trimming. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
: process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS.split(',').map((h) =>
h.trim()).filter(Boolean)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Splitting an env var that has trailing commas or accidental empty entries
will produce empty strings in the array. Filtering falsy values after trimming
prevents passing '' into webpack's allowedHosts and avoids unexpected behavior.
This is a tidy, correct fix.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset-frontend/webpack.config.js
**Line:** 82:82
**Comment:**
*Possible Bug: Splitting WEBPACK_DEVSERVER_ALLOWED_HOSTS on commas
without filtering can produce empty strings (e.g. trailing comma), resulting in
an empty host entry being passed to webpack's allowedHosts; filter out
falsy/empty entries after trimming.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset-frontend/webpack.proxy-config.js:
##########
@@ -155,10 +155,21 @@ module.exports = newManifest => {
return {
context: '/',
target: backend,
- hostRewrite: true,
+ // Preserve original Host header for multitenancy subdomain discovery
+ // hostRewrite: false allows original Host header to pass through
+ hostRewrite: false,
changeOrigin: true,
cookieDomainRewrite: '', // remove cookie domain
selfHandleResponse: true, // so that the onProxyRes takes care of sending
the response
+ onProxyReq(proxyReq, req) {
+ // Preserve original Host header for tenant discovery
+ // The original Host header (e.g., acme.analytics.local:9001) is needed
for subdomain-based tenant discovery
+ if (req.headers.host) {
+ proxyReq.setHeader('Host', req.headers.host);
+ // Also set X-Forwarded-Host for reference (standard proxy header)
+ proxyReq.setHeader('X-Forwarded-Host', req.headers.host);
Review Comment:
**Suggestion:** Logic bug: the new code unconditionally overwrites
`X-Forwarded-Host`, dropping any previously forwarded host values; this breaks
proper proxy header chaining. Preserve and append any existing
`x-forwarded-host` instead of replacing it. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
const incomingHost = req.headers.host;
const prev = req.headers['x-forwarded-host'];
proxyReq.setHeader('X-Forwarded-Host', prev ?
`${prev},${incomingHost}` : incomingHost);
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a valid, low-risk improvement: preserve existing x-forwarded-host
chaining instead of clobbering it. It avoids losing upstream proxy information
and follows standard proxy header behavior.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset-frontend/webpack.proxy-config.js
**Line:** 170:170
**Comment:**
*Logic Error: Logic bug: the new code unconditionally overwrites
`X-Forwarded-Host`, dropping any previously forwarded host values; this breaks
proper proxy header chaining. Preserve and append any existing
`x-forwarded-host` instead of replacing it.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+ - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+ - Tenants table (columns: slug, oauth_issuer, client_id, client_secret,
scopes, db_sqlalchemy_uri, default_schema)
+ - Request flow: incoming URL → lookup → set g.tenant → pick OAuth provider
→ authorize
Review Comment:
**Suggestion:** Concurrency/global-state risk: instructing middleware to
"set g.tenant" (Flask `g`) is ambiguous and can cause thread-local or async
context issues if not implemented correctly; recommend explicitly documenting
storing tenant in the request context/middleware and avoiding shared global
mutable state across threads/async tasks. [race condition]
**Severity Level:** Minor ⚠️
```suggestion
- Request flow: incoming URL → lookup → set tenant in the request
context/middleware (avoid shared global mutable state like process-wide
globals) → pick OAuth provider → authorize; ensure context propagation works
for async workers and background tasks
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Pointing out Flask's `g` vs explicit request context is valid: `g` is
request-local in sync WSGI but can be misused with async workers or background
tasks.
The suggested clarification prevents race conditions and incorrect tenant
propagation.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** .cursor/rules/superset-multi-tenancy.mdc
**Line:** 12:12
**Comment:**
*Race Condition: Concurrency/global-state risk: instructing middleware
to "set g.tenant" (Flask `g`) is ambiguous and can cause thread-local or async
context issues if not implemented correctly; recommend explicitly documenting
storing tenant in the request context/middleware and avoiding shared global
mutable state across threads/async tasks.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/multitenancy/commands/seed_tenants.py:
##########
@@ -0,0 +1,190 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Command to seed tenants table with development data"""
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+import click
+from flask.cli import with_appcontext
+
+from superset import db
+from superset.multitenancy.factories.tenant_factory import TenantFactory
+from superset.multitenancy.models.tenant import Tenant
Review Comment:
**Suggestion:** Unused import: `Tenant` is imported but never used anywhere
in this module; leaving unused imports causes noise, may fail linters, and can
hide intended type checks - remove the import or use it for type annotations.
[possible bug]
**Severity Level:** Critical 🚨
```suggestion
# from superset.multitenancy.models.tenant import Tenant # removed unused
import
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The import is present in the final file but never used — removing it fixes
linter warnings and is a safe, low-risk cleanup that doesn't affect runtime
behavior.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/multitenancy/commands/seed_tenants.py
**Line:** 29:29
**Comment:**
*Possible Bug: Unused import: `Tenant` is imported but never used
anywhere in this module; leaving unused imports causes noise, may fail linters,
and can hide intended type checks - remove the import or use it for type
annotations.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]