Re: [PR] fix: cache warmup using WebDriver for reliable authentication [superset]

via GitHub Thu, 05 Feb 2026 17:07:15 -0800


codeant-ai-for-open-source[bot] commented on code in PR #34525:
URL: https://github.com/apache/superset/pull/34525#discussion_r2771789313



##########
superset/tasks/cache.py:
##########
@@ -118,8 +99,12 @@ class DummyStrategy(Strategy):  # pylint: 
disable=too-few-public-methods
 
     name = "dummy"
 
-    def get_tasks(self) -> list[CacheWarmupTask]:
-        return [get_task(chart) for chart in db.session.query(Slice).all()]
+    def get_urls(self) -> list[str]:
+        dashboards = (
+            
db.session.query(Dashboard).filter(Dashboard.published.is_(True)).all()
+        )
+
+        return [get_dash_url(dashboard) for dashboard in dashboards if 
dashboard.slices]
 

Review Comment:
   **Suggestion:** The dummy warmup strategy uses `dashboard.slices` in a list 
comprehension, which will lazily load slices for each dashboard and trigger one 
database query per dashboard, causing an N+1 query performance bug when there 
are many dashboards. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Increased DB load during scheduled cache warmup tasks.
   - ⚠️ Longer task runtime for many dashboards.
   - ❌ Potential task timeouts on large installations.
   ```
   </details>
   
   ```suggestion
               db.session.query(Dashboard)
               .filter(Dashboard.published.is_(True), Dashboard.slices.any())
               .all()
           )
   
           return [get_dash_url(dashboard) for dashboard in dashboards]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Identify the entry point: the DummyStrategy.get_urls() method is defined 
in
   superset/tasks/cache.py and called from cache_warmup() when strategy_name == 
"dummy" (see
   superset/tasks/cache.py: @celery_app.task cache_warmup and strategies list).
   
   2. Configure and run the Celery task: trigger the cache_warmup task with
   strategy_name='dummy' (the Celery task defined at superset/tasks/cache.py 
calls
   strategy.get_urls()).
   
   3. Observe DB queries when building URLs: the current implementation loads 
all published
   dashboards via db.session.query(...).all() (superset/tasks/cache.py lines 
~102-104), then
   iterates dashboards and accesses dashboard.slices in the list comprehension 
(line ~108).
   
   4. Reproduce N+1 behavior: with many dashboards in the DB (e.g., 100), run 
the task and
   inspect SQL logs — each dashboard.slices access triggers a separate 
lazy-load query for
   slices, producing ~1 (initial dashboards query) + N additional queries. The 
code path and
   file locations are verifiable in superset/tasks/cache.py where 
DummyStrategy.get_urls is
   defined.
   
   5. Confirm effect: replacing the per-instance attribute check with a single 
SQL predicate
   (Dashboard.slices.any()) avoids the per-dashboard extra queries and returns 
only
   dashboards that already have slices, as suggested.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/cache.py
   **Line:** 104:108
   **Comment:**
        *Performance: The dummy warmup strategy uses `dashboard.slices` in a 
list comprehension, which will lazily load slices for each dashboard and 
trigger one database query per dashboard, causing an N+1 query performance bug 
when there are many dashboards.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/tasks/cache.py:
##########
@@ -307,25 +229,32 @@ def cache_warmup(
         logger.exception(message)
         return message
 
-    results: dict[str, list[str]] = {"scheduled": [], "errors": []}
-    for task in strategy.get_tasks():
-        username = task["username"]
-        payload = json.dumps(task["payload"])
-        if username:
+    results: dict[str, list[str]] = {"success": [], "errors": []}
+
+    user = security_manager.find_user(
+        username=current_app.config["SUPERSET_CACHE_WARMUP_USER"]
+    )
+    if not user:
+        message = (
+            f"Cache warmup user 
'{current_app.config['SUPERSET_CACHE_WARMUP_USER']}' "
+            "not found. Please configure SUPERSET_CACHE_WARMUP_USER with a 
valid username."
+        )
+        logger.error(message)
+        return message
+
+    wd = WebDriverSelenium(current_app.config["WEBDRIVER_TYPE"], user=user)
+
+    try:
+        for url in strategy.get_urls():
             try:
-                user = security_manager.get_user_by_username(username)
-                cookies = MachineAuthProvider.get_auth_cookies(user)
-                headers = {
-                    "Cookie": "session=%s" % cookies.get("session", ""),
-                    "Content-Type": "application/json",
-                }
-                logger.info("Scheduling %s", payload)
-                fetch_url.delay(payload, headers)
-                results["scheduled"].append(payload)
-            except SchedulingError:
-                logger.exception("Error scheduling fetch_url for payload: %s", 
payload)
-                results["errors"].append(payload)
-        else:
-            logger.warning("Executor not found for %s", payload)
+                logger.info("Fetching %s", url)
+                wd.get_screenshot(url, "grid-container")
+                results["success"].append(url)
+            except URLError:
+                logger.exception("Error warming up cache!")

Review Comment:
   **Suggestion:** The error handling for dashboard fetches only catches 
`URLError`, which is specific to urllib-based HTTP calls, but this code now 
uses Selenium via `WebDriverSelenium`, so typical WebDriver failures will raise 
different exceptions and cause the entire cache warmup task to fail instead of 
being recorded per-URL. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Task aborts on WebDriver errors; stops remaining warmups.
   - ⚠️ Per-URL failures not recorded currently.
   - ⚠️ Affects cache warmup reliability in real deployments.
   ```
   </details>
   
   ```suggestion
                   except Exception:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Entry point: cache_warmup Celery task in superset/tasks/cache.py executes 
for each URL
   returned by strategy.get_urls() and calls WebDriverSelenium.get_screenshot 
(see
   superset/tasks/cache.py lines where wd is created and used).
   
   2. Simulate a WebDriver failure: cause WebDriver to fail for a dashboard URL 
(examples:
   WebDriver crash, element not found, timeouts). These will raise 
Selenium-specific
   exceptions (e.g., selenium.common.exceptions.WebDriverException, 
TimeoutException), not
   urllib.error.URLError.
   
   3. Run the task and observe behavior: because only URLError is caught in the 
per-URL
   try/except (superset/tasks/cache.py lines ~251-255), Selenium exceptions 
will escape the
   inner except block and propagate to the outer try, potentially crashing the 
task or
   skipping the per-URL result handling. SQL logs and task logs will show an 
unhandled
   exception from WebDriverSelenium.get_screenshot.
   
   4. Reproduce and verify: run the task with an intentionally invalid 
dashboard or
   misconfigured WebDriver (e.g., stop the browser service) and confirm an 
exception bubbles
   up and the remaining URLs are not processed. Changing the except clause to 
catch broader
   exceptions (or specific Selenium exceptions) records per-URL failures in 
results["errors"]
   and prevents the whole task from failing.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tasks/cache.py
   **Line:** 254:254
   **Comment:**
        *Possible Bug: The error handling for dashboard fetches only catches 
`URLError`, which is specific to urllib-based HTTP calls, but this code now 
uses Selenium via `WebDriverSelenium`, so typical WebDriver failures will raise 
different exceptions and cause the entire cache warmup task to fail instead of 
being recorded per-URL.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/utils/screenshots.py:
##########
@@ -204,13 +206,13 @@ def driver(self, window_size: WindowSize | None = None) 
-> WebDriver:
             )
 
         # Use Selenium as default/fallback
-        return WebDriverSelenium(self.driver_type, window_size)
+        return WebDriverSelenium(self.driver_type, window_size, user)
 
     def get_screenshot(
         self, user: User, window_size: WindowSize | None = None

Review Comment:
   **Suggestion:** The new `get_screenshot` signature now requires a `user` 
argument with no default, which is a breaking change from the previous API 
(`get_screenshot(window_size: WindowSize | None = None)`); any existing callers 
that omit `user` (or call it positionally with just `window_size`) will now 
fail with a `TypeError`, so the `user` parameter should be made optional with a 
default (e.g. `None`) to preserve backward compatibility while still allowing 
explicit users for authentication. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Thumbnail generation scripts calling get_screenshot without user will 
error.
   - ⚠️ Some unit tests may fail due to changed signature.
   - ⚠️ Cache warmup callers using older API may break.
   - ⚠️ Backward incompatible API change for external integrations.
   ```
   </details>
   
   ```suggestion
           self, user: User | None = None, window_size: WindowSize | None = None
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the updated method signature in file 
superset/utils/screenshots.py at the lines
   shown in the PR (def get_screenshot(...) at 
superset/utils/screenshots.py:211-216). The
   current signature requires a positional/keyword parameter user (no default).
   
   2. Reproduce directly in a Python REPL (or unit test) by instantiating a 
screenshot class
   and calling the method without a user: e.g.
   
      - from superset.utils.screenshots import DashboardScreenshot
   
      - s = DashboardScreenshot("http://example";, digest=None)
   
      - s.get_screenshot() # <-- call with no args
   
      Expected outcome: Python raises TypeError: get_screenshot() missing 1 
required
      positional argument: 'user' because the method requires user.
   
   3. Confirm an existing internal caller in the same file uses the keyword 
user:
   compute_and_cache calls self.get_screenshot(user=user, 
window_size=window_size) (see
   superset/utils/screenshots.py around compute_and_cache where get_screenshot 
is invoked
   with user parameter). That caller continues to work.
   
   4. However, any external code (scripts, tests, or utilities) or older code 
paths that
   relied on the previous signature get_screenshot(window_size: WindowSize | 
None = None) and
   called it positionally (s.get_screenshot(window_size)) or with no arguments 
will now fail
   at runtime with TypeError. This is reproduced by replacing step 2 call with
   s.get_screenshot(None) vs s.get_screenshot(window_size=(800,600)) to see the 
positional vs
   keyword behavior change.
   
   Explanation: The change is a breaking API change for callers that omitted 
user; making
   user optional (default None) preserves backward compatibility while allowing 
explicit user
   passing for authentication.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/screenshots.py
   **Line:** 212:212
   **Comment:**
        *Logic Error: The new `get_screenshot` signature now requires a `user` 
argument with no default, which is a breaking change from the previous API 
(`get_screenshot(window_size: WindowSize | None = None)`); any existing callers 
that omit `user` (or call it positionally with just `window_size`) will now 
fail with a `TypeError`, so the `user` parameter should be made optional with a 
default (e.g. `None`) to preserve backward compatibility while still allowing 
explicit users for authentication.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
docs/docs/configuration/cache.mdx:
##########
@@ -80,6 +80,30 @@ instead requires a cachelib object.
 
 See [Async Queries via Celery](/docs/configuration/async-queries-celery) for 
details.
 
+## Celery beat
+
+Superset has a Celery task that will periodically warm up the cache based on 
different strategies.
+To use it, add the following to the `CELERYBEAT_SCHEDULE` section in 
`config.py`:
+
+```python
+SUPERSET_CACHE_WARMUP_USER = "user_with_permission_to_dashboards"

Review Comment:
   **Suggestion:** The example Celery beat configuration uses `crontab` without 
importing it, so copying this snippet into a config file as-is will raise a 
`NameError` at import time because `crontab` is undefined. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Process import fails with NameError on configuration load.
   - ⚠️ Celerybeat schedule will not be registered.
   - ⚠️ Operators copying example will see startup errors.
   ```
   </details>
   
   ```suggestion
   from celery.schedules import crontab
   
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Copy the example block from `docs/docs/configuration/cache.mdx` lines 
89-101 into a
   project configuration file (e.g., `superset_config.py`). The snippet at
   `docs/docs/configuration/cache.mdx:89` contains `schedule: crontab(minute=0, 
hour='*')`
   but no `crontab` import.
   
   2. Start Superset (process imports `superset_config.py` on startup). Python 
evaluates
   module-level top-level code and attempts to resolve `crontab` during import.
   
   3. Because `crontab` is undefined, Python raises NameError at import time, 
causing the
   process that imported the config (webserver or Celery worker) to fail to 
start or fail to
   load the schedule. The error occurs immediately when the module is imported.
   
   4. Expected observable outcome: process fails with traceback showing 
NameError: name
   'crontab' is not defined originating from the copied config block (matching 
line with
   `schedule: crontab(...)`).
   
   Note: This is reproducible by pasting the docs snippet as-is into any 
Superset config file
   that is imported on startup.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docs/docs/configuration/cache.mdx
   **Line:** 89:89
   **Comment:**
        *Possible Bug: The example Celery beat configuration uses `crontab` 
without importing it, so copying this snippet into a config file as-is will 
raise a `NameError` at import time because `crontab` is undefined.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/utils/webdriver.py:
##########
@@ -592,10 +624,14 @@ def find_unexpected_errors(driver: WebDriver) -> 
list[str]:
 
         return error_messages
 
-    def get_screenshot(self, url: str, element_name: str, user: User) -> bytes 
| None:  # noqa: C901
-        driver = self.auth(user)
-        driver.set_window_size(*self._window)
-        driver.get(url)
+    def get_screenshot(  # noqa: C901
+        self, url: str, element_name: str, user: User | None = None
+    ) -> bytes | None:
+        if user and not self._user:

Review Comment:
   **Suggestion:** Once a Selenium driver has been initialized with one user, 
subsequent calls to `get_screenshot` that pass a different user are silently 
ignored, so the browser keeps using the first user session and dashboards are 
rendered/caches warmed for the wrong user, which breaks the intent of per-call 
authentication and can lead to incorrect cache contents. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Cache warmup generates caches for wrong user contexts.
   - ⚠️ Dashboard screenshots use stale user sessions.
   - ⚠️ Permission-sensitive dashboards may be skipped/mis-cached.
   ```
   </details>
   
   ```suggestion
       if user:
           # Re-authenticate the driver when a different user is passed in
           if user != self._user:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Locate the Selenium screenshot implementation in 
`superset/utils/webdriver.py` at the
   method WebDriverSelenium.get_screenshot (see file, lines 627-634). The 
method begins with
   the existing guard:
   
      - 627 if user and not self._user:
   
      - 628 self._user = user
   
      - 629 if self._driver:
   
      - 630 self._auth(user)
   
      - 631 632 633 634 self.driver.get(url)
   
   2. Start a process that reuses a single WebDriverSelenium instance across 
multiple
   screenshot calls (the class keeps a persistent driver in self._driver and 
persistent user
   in self._user; see class fields at lines ~404-414 in the same file).
   
   3. First call: call WebDriverSelenium.get_screenshot(url1, element_name, 
user=A). Because
   self._user is initially None, the condition `if user and not self._user` is 
true and
   authentication runs for user A (self._user set to A and self._auth called). 
The driver now
   holds user A session.
   
   4. Second call: call WebDriverSelenium.get_screenshot(url2, element_name, 
user=B) where B
   != A. Execution reaches the same guard and evaluates `if user and not 
self._user` — since
   self._user is already set (A), the condition is false, so the code does NOT
   re-authenticate for B and the browser continues using user A's session; then
   self.driver.get(url2) loads the page as A.
   
   5. Observe that dashboards/pages requested with user B are rendered as user 
A; caches
   warmed will be for A's view/permissions. This reproduces incorrect per-call 
authentication
   behavior and can be verified by (a) inspecting network/session cookies in 
the browser
   process, or (b) requesting a dashboard only visible to B and seeing it 
instead render A's
   view or raise permission errors.
   
   6. Real-world trigger: cache-warmup task (PR summary mentions 
`superset/tasks/cache.py`)
   or any caller that reuses the same WebDriverSelenium instance but passes 
different users
   across calls will exhibit this behavior. The driver lifecycle and user 
storage are in
   `superset/utils/webdriver.py` (class WebDriverSelenium, constructor lines 
~404-414).
   
   Explanation: the existing code only sets/authorizes when self._user is 
unset, so passing a
   different user later is silently ignored. The suggested improved_code 
re-authenticates
   when a different user is provided.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/webdriver.py
   **Line:** 627:630
   **Comment:**
        *Logic Error: Once a Selenium driver has been initialized with one 
user, subsequent calls to `get_screenshot` that pass a different user are 
silently ignored, so the browser keeps using the first user session and 
dashboards are rendered/caches warmed for the wrong user, which breaks the 
intent of per-call authentication and can lead to incorrect cache contents.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/integration_tests/strategy_tests.py:
##########
@@ -82,15 +82,9 @@ def test_top_n_dashboards_strategy(self):
             self.client.get(f"/superset/dashboard/{dash.id}/")
 
         strategy = TopNDashboardsStrategy(1)
-        result = strategy.get_tasks()
-        expected = [
-            {
-                "payload": {"chart_id": chart.id, "dashboard_id": dash.id},
-                "username": "admin",
-            }
-            for chart in dash.slices
-        ]
-        assert len(result) == len(expected)
+        result = sorted(strategy.get_urls())
+        expected = sorted([f"http://localhost{dash.url}";])

Review Comment:
   **Suggestion:** The expected URL in the top-N dashboards test is built by 
hardcoding the "http://localhost"; prefix, which will break as soon as the 
application's configured base URL (including scheme, host, or port) differs 
from this hardcoded value; using the existing `get_url_host` helper to 
construct the URL ensures the test matches the actual URLs produced by the 
strategy. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Integration test failure in CI when BASE URL differs.
   - ⚠️ Flaky tests across developer environments.
   - ⚠️ Slows down developer feedback loop.
   ```
   </details>
   
   ```suggestion
           expected = sorted([get_url_host(dash.url)])
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the test suite that includes 
tests/integration_tests/strategy_tests.py (file under
   test). The test function is test_top_n_dashboards_strategy in this file (see
   tests/integration_tests/strategy_tests.py).
   
   2. Within test_top_n_dashboards_strategy
   (tests/integration_tests/strategy_tests.py:84-87), the code instantiates
   TopNDashboardsStrategy and calls strategy.get_urls() which returns absolute 
URLs built by
   the production helper (the strategy implementation uses the application 
configuration to
   build full URLs).
   
   3. The test compares the returned URLs (result) against an expected list 
built with a
   hardcoded "http://localhost"; prefix (line shown at
   tests/integration_tests/strategy_tests.py:86). If the application's 
configured
   host/scheme/port differs from "http://localhost"; (for example CI or 
developer uses a
   different BASE_URL), strategy.get_urls() will produce URLs that do not match
   "http://localhost{dash.url}"; causing the assertion at
   tests/integration_tests/strategy_tests.py:87 to fail.
   
   4. Reproducing locally: change the application base URL (or run tests in an 
environment
   where get_url_host uses e.g., "https://example:8080";) so that 
get_url_host(dash.url) !=
   "http://localhost{dash.url}";. Run the single test 
test_top_n_dashboards_strategy and
   observe assertion failure comparing result to hardcoded expected value at
   tests/integration_tests/strategy_tests.py:86-87.
   
   5. Rationale why suggested change fixes it: replacing the hardcoded prefix 
with
   get_url_host(dash.url) uses the same helper the codebase exposes for 
constructing
   host-aware URLs (get_url_host is imported at top of the file), ensuring the 
expected value
   matches the actual URL construction logic used by the strategy.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/strategy_tests.py
   **Line:** 86:86
   **Comment:**
        *Logic Error: The expected URL in the top-N dashboards test is built by 
hardcoding the "http://localhost"; prefix, which will break as soon as the 
application's configured base URL (including scheme, host, or port) differs 
from this hardcoded value; using the existing `get_url_host` helper to 
construct the URL ensures the test matches the actual URLs produced by the 
strategy.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
docs/docs/configuration/cache.mdx:
##########
@@ -80,6 +80,30 @@ instead requires a cachelib object.
 
 See [Async Queries via Celery](/docs/configuration/async-queries-celery) for 
details.
 
+## Celery beat
+
+Superset has a Celery task that will periodically warm up the cache based on 
different strategies.
+To use it, add the following to the `CELERYBEAT_SCHEDULE` section in 
`config.py`:

Review Comment:
   **Suggestion:** The text instructs users to add the Celery beat 
configuration to `config.py`, but Superset actually reads configuration from 
`superset_config.py`, so following this documentation literally will result in 
the cache warmup never being scheduled. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Cache warmup never scheduled by Celerybeat.
   - ⚠️ Admins may believe feature configured when ignored.
   - ⚠️ Troubleshooting time increases for operators.
   ```
   </details>
   
   ```suggestion
   To use it, add the following to the `CELERYBEAT_SCHEDULE` section in 
`superset_config.py`:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Follow the documentation in `docs/docs/configuration/cache.mdx` line 86 
and add the
   example `CELERYBEAT_SCHEDULE` to a file named `config.py` in your project 
root.
   
   2. Start Superset webserver and Celery workers. Superset conventionally 
loads user
   configuration from `superset_config.py` (the common, documented entrypoint 
for runtime
   configuration).
   
   3. Because the settings were placed in `config.py` (a different filename), 
Superset will
   not import them and the Celery beat schedule will not be registered; the 
cache-warmup task
   will never be scheduled.
   
   4. Observable result: no cache-warmup tasks appear in Celerybeat, and 
dashboard caches
   remain unwarmed despite the administrator believing configuration was 
applied.
   
   Note: The docs explicitly instruct editing a config filename; using the 
wrong filename is
   a realistic and common user error when docs use an incorrect filename.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** docs/docs/configuration/cache.mdx
   **Line:** 86:86
   **Comment:**
        *Possible Bug: The text instructs users to add the Celery beat 
configuration to `config.py`, but Superset actually reads configuration from 
`superset_config.py`, so following this documentation literally will result in 
the cache warmup never being scheduled.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/integration_tests/strategy_tests.py:
##########
@@ -108,39 +102,27 @@ def test_dashboard_tags_strategy(self):
         self.reset_tag(tag1)
 
         strategy = DashboardTagsStrategy(["tag1"])
-        assert strategy.get_tasks() == []
+        result = sorted(strategy.get_urls())
+        expected = []
+        assert result == expected
 
         # tag dashboard 'births' with `tag1`
         tag1 = get_tag("tag1", db.session, TagType.custom)
         dash = self.get_dash_by_slug("births")
-        tag1_payloads = [{"chart_id": chart.id} for chart in dash.slices]
+        tag1_urls = [f"http://localhost{dash.url}";]
         tagged_object = TaggedObject(

Review Comment:
   **Suggestion:** In the dashboard tags strategy test, the expected URL is 
again constructed with a hardcoded "http://localhost"; prefix, so if the 
application's base URL (including port) is different from this, the assertion 
comparing these expected URLs with `strategy.get_urls()` will fail even though 
the strategy is correct; using `get_url_host` keeps the expectation aligned 
with actual behavior. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Integration test failure in CI when BASE URL differs.
   - ⚠️ Flaky tests across developer environments.
   - ⚠️ Causes misleading test failures unrelated to feature logic.
   ```
   </details>
   
   ```suggestion
           tag1_urls = [get_url_host(dash.url)]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the test_dashboard_tags_strategy test defined in
   tests/integration_tests/strategy_tests.py. The relevant block begins with 
tagging a
   dashboard and asserting the strategy returns the expected URL (see
   tests/integration_tests/strategy_tests.py:110-121).
   
   2. The test creates or retrieves a dashboard (dash = 
self.get_dash_by_slug("births") at
   tests/integration_tests/strategy_tests.py:111) and constructs tag1_urls 
using a hardcoded
   "http://localhost"; prefix (tests/integration_tests/strategy_tests.py:112). 
Meanwhile,
   strategy.get_urls() returns URLs constructed using the application's URL
   helper/configuration.
   
   3. If the application under test uses a different host/scheme/port than 
"http://localhost";
   (e.g., running tests in CI with a different BASE_URL), the returned list from
   strategy.get_urls() will not equal the hardcoded tag1_urls and the assertion 
at
   tests/integration_tests/strategy_tests.py:121 will fail.
   
   4. Reproduce by setting the app base URL to a value other than localhost (or 
running in an
   environment where get_url_host returns a different host), then run only
   test_dashboard_tags_strategy and observe assertion failure comparing result 
to the
   hardcoded tag1_urls at tests/integration_tests/strategy_tests.py:121.
   
   5. Using get_url_host(dash.url) makes the expected value use the same 
host-construction
   logic as the code under test, avoiding environment-dependent mismatches.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/strategy_tests.py
   **Line:** 112:113
   **Comment:**
        *Logic Error: In the dashboard tags strategy test, the expected URL is 
again constructed with a hardcoded "http://localhost"; prefix, so if the 
application's base URL (including port) is different from this, the assertion 
comparing these expected URLs with `strategy.get_urls()` will fail even though 
the strategy is correct; using `get_url_host` keeps the expectation aligned 
with actual behavior.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: cache warmup using WebDriver for reliable authentication [superset]

Reply via email to