codeant-ai-for-open-source[bot] commented on code in PR #37070:
URL: https://github.com/apache/superset/pull/37070#discussion_r2684859440


##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+   - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")

Review Comment:
   **Suggestion:** Host/header-based tenant discovery note is unsafe as 
written: relying solely on Host header or simple subdomain parsing 
("{tenant}.mydomain.com") without strict validation allows host-header 
manipulation and tenant spoofing—recommend enforcing canonical host checks, 
validating against a tenant registry, and rejecting unknown/forged Host 
headers. [security]
   
   **Severity Level:** Critical 🚨
   ```suggestion
      - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com") — 
enforce canonical Host header validation and map hosts to tenant entries from 
the Tenants registry; never trust raw Host header without normalization and 
allowlist checks to prevent host-header injection or tenant spoofing
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The doc currently suggests simple subdomain parsing; without canonical host 
validation and a tenancy registry this is vulnerable
   to Host header manipulation. The improved wording calls out concrete 
mitigations and is relevant to the plan.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/superset-multi-tenancy.mdc
   **Line:** 10:10
   **Comment:**
        *Security: Host/header-based tenant discovery note is unsafe as 
written: relying solely on Host header or simple subdomain parsing 
("{tenant}.mydomain.com") without strict validation allows host-header 
manipulation and tenant spoofing—recommend enforcing canonical host checks, 
validating against a tenant registry, and rejecting unknown/forged Host headers.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
.cursor/rules/apache-superset-architect.mdc:
##########
@@ -0,0 +1,36 @@
+---
+description: Apache Superset Architect and Designer for Cursor IDE
+globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql", 
"**/*.md"]
+alwaysApply: true
+---
+
+# Software Architect – Apache Superset Extension
+
+ROLE
+Act as a senior Software Architect designing an extension for Apache Superset. 
Respect Superset’s extension surfaces (viz plugins via @superset-ui, DB Engine 
Specs) and community contribution protocols. Avoid core patches unless a SIP is 
warranted.
+
+OBJECTIVE
+Design a maintainable, future-proof extension that stays compatible with 
upstream changes and follows Superset’s coding, testing, and review standards.
+
+DELIVERABLE FORMAT
+1) Context & objectives
+2) Architecture & diagrams (components, interfaces, data flow)
+3) Design principles & patterns (SOLID/DRY; Strategy/Adapter/Factory/Facade)
+4) Contribution plan (SIP if needed; PR titles/labels; artifacts/screenshots)
+5) Testing & CI/CD (pytest/Jest/Playwright; coverage; pre-commit; GH Actions)
+6) Compatibility & versioning strategy (avoid private APIs; feature flags as 
needed)
+7) Security & observability (logging/metrics; ASF vuln norms)
+8) Risks & mitigations
+
+SUPERESET-SPECIFIC GUARDRAILS

Review Comment:
   **Suggestion:** Typo in the section header: "SUPERESET-SPECIFIC GUARDRAILS" 
is misspelled and should be "SUPERSET-SPECIFIC GUARDRAILS"; this will cause 
confusing search/selection by readers and any automated lookup by exact header 
text to fail. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   SUPERSET-SPECIFIC GUARDRAILS
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   There's a real typo in the file ("SUPERESET" vs "SUPERSET"/"Superset") 
visible in the final state; fixing it improves readability and prevents 
exact-text lookup failures. It's a small, correctable bug.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/apache-superset-architect.mdc
   **Line:** 25:25
   **Comment:**
        *Possible Bug: Typo in the section header: "SUPERESET-SPECIFIC 
GUARDRAILS" is misspelled and should be "SUPERSET-SPECIFIC GUARDRAILS"; this 
will cause confusing search/selection by readers and any automated lookup by 
exact header text to fail.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/multitenancy/commands/seed_tenants.py:
##########
@@ -0,0 +1,190 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Command to seed tenants table with development data"""
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+import click
+from flask.cli import with_appcontext
+
+from superset import db
+from superset.multitenancy.factories.tenant_factory import TenantFactory
+from superset.multitenancy.models.tenant import Tenant
+from superset.multitenancy.repositories.tenant_repository import 
TenantRepository
+
+logger = logging.getLogger(__name__)

Review Comment:
   **Suggestion:** Unused logger: `logger` is created but never used; either 
use `logger` for messages (recommended) or remove the instantiation to avoid 
unused-variable warnings and confusion. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   # logger = logging.getLogger(__name__)  # removed unused logger (use logging 
or click.echo consistently)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The logger is instantiated and never referenced. Either remove it or use it 
for informational/error logs — both options are valid and improve code clarity.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/multitenancy/commands/seed_tenants.py
   **Line:** 32:32
   **Comment:**
        *Possible Bug: Unused logger: `logger` is created but never used; 
either use `logger` for messages (recommended) or remove the instantiation to 
avoid unused-variable warnings and confusion.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+   - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+   - Tenants table (columns: slug, oauth_issuer, client_id, client_secret, 
scopes, db_sqlalchemy_uri, default_schema)

Review Comment:
   **Suggestion:** Security risk: the plan lists `client_secret` as a column in 
the Tenants table without any guidance to protect it; storing raw client 
secrets in the application database can expose credentials if the DB is 
compromised—recommend removing raw secret storage or explicitly calling out 
using a secrets manager or encryption-at-rest with strict access controls. 
[security]
   
   **Severity Level:** Critical 🚨
   ```suggestion
      - Tenants table (columns: slug, oauth_issuer, client_id, scopes, 
db_sqlalchemy_uri, default_schema) — do NOT store raw `client_secret` in 
application DB; store secrets in a dedicated secrets manager (e.g., Vault, AWS 
Secrets Manager) or encrypt them with a key management service and document 
access control/rotation procedures
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a real security concern: the PR's doc explicitly includes 
client_secret in the tenants table.
   Removing raw secrets or mandating a secrets manager/KMS mitigates credential 
exposure and
   credential-rotation risks. The suggested change directly improves security 
posture.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/superset-multi-tenancy.mdc
   **Line:** 11:11
   **Comment:**
        *Security: Security risk: the plan lists `client_secret` as a column in 
the Tenants table without any guidance to protect it; storing raw client 
secrets in the application database can expose credentials if the DB is 
compromised—recommend removing raw secret storage or explicitly calling out 
using a secrets manager or encryption-at-rest with strict access controls.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+   - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+   - Tenants table (columns: slug, oauth_issuer, client_id, client_secret, 
scopes, db_sqlalchemy_uri, default_schema)
+   - Request flow: incoming URL → lookup → set g.tenant → pick OAuth provider 
→ authorize
+
+2) auth-design.md
+   - CustomSecurityManager responsibilities:
+     - Select OAuth provider dynamically from Tenants
+     - oauth_user_info mapping (claims→username/email/roles)
+   - Session/CSRF handling for API calls
+   - Fallback/error UX if tenant not found
+
+3) provisioning.md
+   - API calls for:
+     - POST /api/v1/database (create tenant connection)

Review Comment:
   **Suggestion:** Resource exhaustion/design risk: allowing POST 
/api/v1/database to create tenant connections without provisioning limits or 
pooling guidance can exhaust DB connection pools and resources; recommend 
documenting quotas, validation, lazy provisioning, and connection-pooling 
limits. [resource leak]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
        - POST /api/v1/database (create tenant connection) — document and 
enforce quotas, input validation, lazy provisioning, and connection-pooling 
limits; validate DB URIs, require admin auth, and ensure creation cannot 
exhaust global connection pools
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Allowing unaudited creation of DB connections per-tenant is an operational 
risk. The suggested additions (quotas, validation, lazy provisioning, pool 
limits) address a real resource-exhaustion vector.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/superset-multi-tenancy.mdc
   **Line:** 23:23
   **Comment:**
        *Resource Leak: Resource exhaustion/design risk: allowing POST 
/api/v1/database to create tenant connections without provisioning limits or 
pooling guidance can exhaust DB connection pools and resources; recommend 
documenting quotas, validation, lazy provisioning, and connection-pooling 
limits.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
.cursor/rules/apache-superset-architect.mdc:
##########
@@ -0,0 +1,36 @@
+---
+description: Apache Superset Architect and Designer for Cursor IDE
+globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql", 
"**/*.md"]

Review Comment:
   **Suggestion:** Configuration bug in the frontmatter `globs` list: the 
rule's own file extension (`.mdc`) is not included, so the rule will not match 
`.mdc` files (including itself) — add `"**/*.mdc"` to the glob list to ensure 
the rule applies to .mdc files. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
   globs: ["**/*.py", "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "**/*.sql", 
"**/*.md", "**/*.mdc"]
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The rule's frontmatter currently omits "*.mdc", so the rule may not match 
.mdc files (including itself). Adding "**/*.mdc" is a practical fix to ensure 
the rule applies where intended.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/apache-superset-architect.mdc
   **Line:** 3:3
   **Comment:**
        *Logic Error: Configuration bug in the frontmatter `globs` list: the 
rule's own file extension (`.mdc`) is not included, so the rule will not match 
`.mdc` files (including itself) — add `"**/*.mdc"` to the glob list to ensure 
the rule applies to .mdc files.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset-frontend/webpack.config.js:
##########
@@ -62,9 +62,26 @@ const {
 
 // Precedence: CLI args > env vars > defaults
 const devserverPort = cliPort || process.env.WEBPACK_DEVSERVER_PORT || 9000;
+// Default to 127.0.0.1 for security (localhost only)
+// For multitenancy subdomain support, override via 
WEBPACK_DEVSERVER_HOST=0.0.0.0
+// Can be overridden via CLI or WEBPACK_DEVSERVER_HOST env var
 const devserverHost =
   cliHost || process.env.WEBPACK_DEVSERVER_HOST || '127.0.0.1';
 
+// Allowed hosts for webpack dev server
+// Default: localhost and .localhost for security
+// For multitenancy: Set WEBPACK_DEVSERVER_ALLOWED_HOSTS='all' or specific 
domains
+// Examples:
+//   - 'all' (allow any host - use with caution)
+//   - '.analytics.local' (allow all subdomains of analytics.local)
+//   - ['localhost', '.localhost', '.analytics.local'] (specific list)
+const devserverAllowedHosts =
+  process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS
+    ? process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS === 'all'
+      ? 'all'
+      : process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS.split(',').map((h) => 
h.trim())

Review Comment:
   **Suggestion:** Splitting WEBPACK_DEVSERVER_ALLOWED_HOSTS on commas without 
filtering can produce empty strings (e.g. trailing comma), resulting in an 
empty host entry being passed to webpack's allowedHosts; filter out falsy/empty 
entries after trimming. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
         : process.env.WEBPACK_DEVSERVER_ALLOWED_HOSTS.split(',').map((h) => 
h.trim()).filter(Boolean)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Splitting an env var that has trailing commas or accidental empty entries 
will produce empty strings in the array. Filtering falsy values after trimming 
prevents passing '' into webpack's allowedHosts and avoids unexpected behavior. 
This is a tidy, correct fix.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset-frontend/webpack.config.js
   **Line:** 82:82
   **Comment:**
        *Possible Bug: Splitting WEBPACK_DEVSERVER_ALLOWED_HOSTS on commas 
without filtering can produce empty strings (e.g. trailing comma), resulting in 
an empty host entry being passed to webpack's allowedHosts; filter out 
falsy/empty entries after trimming.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset-frontend/webpack.proxy-config.js:
##########
@@ -155,10 +155,21 @@ module.exports = newManifest => {
   return {
     context: '/',
     target: backend,
-    hostRewrite: true,
+    // Preserve original Host header for multitenancy subdomain discovery
+    // hostRewrite: false allows original Host header to pass through
+    hostRewrite: false,
     changeOrigin: true,
     cookieDomainRewrite: '', // remove cookie domain
     selfHandleResponse: true, // so that the onProxyRes takes care of sending 
the response
+    onProxyReq(proxyReq, req) {
+      // Preserve original Host header for tenant discovery
+      // The original Host header (e.g., acme.analytics.local:9001) is needed 
for subdomain-based tenant discovery
+      if (req.headers.host) {
+        proxyReq.setHeader('Host', req.headers.host);
+        // Also set X-Forwarded-Host for reference (standard proxy header)
+        proxyReq.setHeader('X-Forwarded-Host', req.headers.host);

Review Comment:
   **Suggestion:** Logic bug: the new code unconditionally overwrites 
`X-Forwarded-Host`, dropping any previously forwarded host values; this breaks 
proper proxy header chaining. Preserve and append any existing 
`x-forwarded-host` instead of replacing it. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           const incomingHost = req.headers.host;
           const prev = req.headers['x-forwarded-host'];
           proxyReq.setHeader('X-Forwarded-Host', prev ? 
`${prev},${incomingHost}` : incomingHost);
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a valid, low-risk improvement: preserve existing x-forwarded-host 
chaining instead of clobbering it. It avoids losing upstream proxy information 
and follows standard proxy header behavior.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset-frontend/webpack.proxy-config.js
   **Line:** 170:170
   **Comment:**
        *Logic Error: Logic bug: the new code unconditionally overwrites 
`X-Forwarded-Host`, dropping any previously forwarded host values; this breaks 
proper proxy header chaining. Preserve and append any existing 
`x-forwarded-host` instead of replacing it.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
.cursor/rules/superset-multi-tenancy.mdc:
##########
@@ -0,0 +1,36 @@
+---
+alwaysApply: true
+---
+
+# Planning Gate – Superset Multi-Tenant (Schema/Connection Isolation)
+
+Before writing code, produce these docs in /docs/multitenancy/:
+
+1) tenant-discovery.md
+   - Chosen discovery pattern (subdomain as "{tenant}.mydomain.com")
+   - Tenants table (columns: slug, oauth_issuer, client_id, client_secret, 
scopes, db_sqlalchemy_uri, default_schema)
+   - Request flow: incoming URL → lookup → set g.tenant → pick OAuth provider 
→ authorize

Review Comment:
   **Suggestion:** Concurrency/global-state risk: instructing middleware to 
"set g.tenant" (Flask `g`) is ambiguous and can cause thread-local or async 
context issues if not implemented correctly; recommend explicitly documenting 
storing tenant in the request context/middleware and avoiding shared global 
mutable state across threads/async tasks. [race condition]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
      - Request flow: incoming URL → lookup → set tenant in the request 
context/middleware (avoid shared global mutable state like process-wide 
globals) → pick OAuth provider → authorize; ensure context propagation works 
for async workers and background tasks
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Pointing out Flask's `g` vs explicit request context is valid: `g` is 
request-local in sync WSGI but can be misused with async workers or background 
tasks.
   The suggested clarification prevents race conditions and incorrect tenant 
propagation.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** .cursor/rules/superset-multi-tenancy.mdc
   **Line:** 12:12
   **Comment:**
        *Race Condition: Concurrency/global-state risk: instructing middleware 
to "set g.tenant" (Flask `g`) is ambiguous and can cause thread-local or async 
context issues if not implemented correctly; recommend explicitly documenting 
storing tenant in the request context/middleware and avoiding shared global 
mutable state across threads/async tasks.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/multitenancy/commands/seed_tenants.py:
##########
@@ -0,0 +1,190 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Command to seed tenants table with development data"""
+
+import json
+import logging
+from pathlib import Path
+from typing import Any
+
+import click
+from flask.cli import with_appcontext
+
+from superset import db
+from superset.multitenancy.factories.tenant_factory import TenantFactory
+from superset.multitenancy.models.tenant import Tenant

Review Comment:
   **Suggestion:** Unused import: `Tenant` is imported but never used anywhere 
in this module; leaving unused imports causes noise, may fail linters, and can 
hide intended type checks - remove the import or use it for type annotations. 
[possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   # from superset.multitenancy.models.tenant import Tenant  # removed unused 
import
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The import is present in the final file but never used — removing it fixes 
linter warnings and is a safe, low-risk cleanup that doesn't affect runtime 
behavior.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/multitenancy/commands/seed_tenants.py
   **Line:** 29:29
   **Comment:**
        *Possible Bug: Unused import: `Tenant` is imported but never used 
anywhere in this module; leaving unused imports causes noise, may fail linters, 
and can hide intended type checks - remove the import or use it for type 
annotations.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to