codeant-ai-for-open-source[bot] commented on code in PR #41243:
URL: https://github.com/apache/superset/pull/41243#discussion_r3441469228


##########
superset/utils/webdriver.py:
##########
@@ -117,6 +118,48 @@ def check_playwright_availability() -> bool:
 PLAYWRIGHT_AVAILABLE = check_playwright_availability()
 
 
+class _PlaywrightBrowserManager:
+    """Manages a long-lived Playwright browser instance per worker process.
+
+    In Celery's prefork model, each worker process runs tasks sequentially,
+    so a single browser instance per process is safe and avoids the overhead
+    of launching/destroying Chromium on every screenshot task. Each task
+    creates a lightweight, isolated browser context instead of a full browser.
+    """
+
+    def __init__(self) -> None:
+        self._playwright: Any = None
+        self._browser: Any = None
+
+    def get_browser(self, browser_args: list[str]) -> Any:
+        """Return a reusable browser, creating one if needed."""
+        if self._browser is not None and self._browser.is_connected():
+            return self._browser
+
+        self._cleanup()
+        self._playwright = sync_playwright().start()
+        self._browser = self._playwright.chromium.launch(args=browser_args)

Review Comment:
   **Suggestion:** `_PlaywrightBrowserManager.get_browser` can leak a started 
Playwright driver when Chromium launch fails: `sync_playwright().start()` is 
called before `chromium.launch(...)`, but there is no cleanup on launch 
exceptions. Wrap launch in a try/except and call `_cleanup()` (or 
`self._playwright.stop()`) before re-raising so failed launches do not leave a 
live Playwright process in the worker. [resource leak]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Thumbnail Celery tasks leak Playwright driver between failed launches.
   - ⚠️ Worker process holds unnecessary Playwright resources until next 
attempt.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Enable Playwright thumbnails so `BaseScreenshot.driver()` in
   `superset/utils/screenshots.py:12-20` returns `WebDriverPlaywright` when
   `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` is enabled and `PLAYWRIGHT_AVAILABLE` is 
True.
   
   2. Configure `WEBDRIVER_OPTION_ARGS` (used at 
`superset/utils/webdriver.py:283-285`) with
   an invalid Chromium flag that causes `p.chromium.launch(args=browser_args)` 
to raise (for
   example, an unknown command-line option).
   
   3. Trigger a thumbnail Celery task such as `cache_chart_thumbnail` in
   `superset/tasks/thumbnails.py:37-44`, which creates a `ChartScreenshot` and 
calls
   `BaseScreenshot.get_screenshot()` (`superset/utils/screenshots.py:32-41`), 
which in turn
   calls `WebDriverPlaywright.get_screenshot()` 
(`superset/utils/webdriver.py:271-283`).
   
   4. Inside `get_screenshot()`, `_PlaywrightBrowserManager.get_browser()`
   (`superset/utils/webdriver.py:134-142`) executes `self._playwright =
   sync_playwright().start()` at line 140 and then raises from 
`chromium.launch(...)` at line
   141; the exception propagates out without calling `_cleanup()`, so the 
started Playwright
   driver process remains running until either the worker later calls 
`get_browser()` again
   or the process exits, leaking resources between failures.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7c884d8f6f7f4d7b8e0680347e3cc34b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7c884d8f6f7f4d7b8e0680347e3cc34b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/webdriver.py
   **Line:** 140:141
   **Comment:**
        *Resource Leak: `_PlaywrightBrowserManager.get_browser` can leak a 
started Playwright driver when Chromium launch fails: 
`sync_playwright().start()` is called before `chromium.launch(...)`, but there 
is no cleanup on launch exceptions. Wrap launch in a try/except and call 
`_cleanup()` (or `self._playwright.stop()`) before re-raising so failed 
launches do not leave a live Playwright process in the worker.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=76f1a46e83c0ab866723abe043fb001367c8f6d63cd185fdfba70fb709c17adf&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=76f1a46e83c0ab866723abe043fb001367c8f6d63cd185fdfba70fb709c17adf&reaction=dislike'>👎</a>



##########
superset/utils/webdriver.py:
##########
@@ -237,26 +280,25 @@ def get_screenshot(  # pylint: disable=too-many-locals, 
too-many-statements  # n
             )
             return None
 
-        with sync_playwright() as playwright:
-            browser_args = app.config["WEBDRIVER_OPTION_ARGS"]
-            browser = playwright.chromium.launch(args=browser_args)
-            pixel_density = 
app.config["WEBDRIVER_WINDOW"].get("pixel_density", 1)
-            viewport_height = self._window[1]
-            viewport_width = self._window[0]
-            context = browser.new_context(
-                bypass_csp=True,
-                viewport={
-                    "height": viewport_height,
-                    "width": viewport_width,
-                },
-                device_scale_factor=pixel_density,
-            )
-            context.set_default_timeout(
-                app.config["SCREENSHOT_PLAYWRIGHT_DEFAULT_TIMEOUT"]
-            )
-            if user:
-                self.auth(user, context)
-            page = context.new_page()
+        browser_args = app.config["WEBDRIVER_OPTION_ARGS"]
+        browser = _browser_manager.get_browser(browser_args)
+        pixel_density = app.config["WEBDRIVER_WINDOW"].get("pixel_density", 1)
+        viewport_height = self._window[1]
+        viewport_width = self._window[0]
+        context = browser.new_context(
+            bypass_csp=True,
+            viewport={
+                "height": viewport_height,
+                "width": viewport_width,
+            },
+            device_scale_factor=pixel_density,
+        )
+        
context.set_default_timeout(app.config["SCREENSHOT_PLAYWRIGHT_DEFAULT_TIMEOUT"])
+        if user:
+            self.auth(user, context)
+        page = context.new_page()
+        img: bytes | None = None
+        try:

Review Comment:
   **Suggestion:** Browser/context/page setup now happens before the protected 
`try` block, so failures in browser retrieval, context creation, auth, or page 
creation bypass the existing Playwright error handling and bubble up as 
uncaught task exceptions. Move this initialization inside the same guarded 
block (or add a dedicated outer handler) to preserve graceful logging/recovery 
behavior. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Thumbnail tasks crash on browser/context/auth creation errors.
   - ⚠️ Playwright initialization failures bypass existing error logging.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. With Playwright installed and `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` 
enabled, a thumbnail
   task like `cache_chart_thumbnail` in `superset/tasks/thumbnails.py:37-44` 
builds a
   `ChartScreenshot` and calls `BaseScreenshot.get_screenshot()` in
   `superset/utils/screenshots.py:32-41`, which instantiates 
`WebDriverPlaywright` and
   invokes its `get_screenshot()` (`superset/utils/webdriver.py:271-283`).
   
   2. In `WebDriverPlaywright.get_screenshot()`, the code at
   `superset/utils/webdriver.py:283-300` retrieves browser args, calls
   `_browser_manager.get_browser(browser_args)`, creates a new context via
   `browser.new_context(...)`, sets timeouts, optionally authenticates with 
`self.auth(user,
   context)`, and creates a new page via `context.new_page()`, all before the 
inner `try:` at
   line 301.
   
   3. Cause an initialization error before the `try:` block, for example by 
having
   `machine_auth_provider_factory.instance.authenticate_browser_context()` 
(called via
   `self.auth(user, context)` at `superset/utils/webdriver.py:297-298`) raise 
due to a
   misconfigured machine auth provider, or by having Playwright raise from
   `browser.new_context(...)` when system resources are exhausted.
   
   4. Observe that the exception from initialization is not caught by the 
`PlaywrightError`
   handler at `superset/utils/webdriver.py:75-80` (which only wraps the inner 
body after line
   301); instead it propagates out to callers like 
`BaseScreenshot.get_screenshot()` and then
   to Celery tasks in `superset/tasks/thumbnails.py`, causing unhandled task 
failures without
   the standardized "Encountered an unexpected error when requesting url" 
logging that is
   applied to later Playwright operations.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=30e0e5678e3d4b2db59b71f9a40dfe99&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=30e0e5678e3d4b2db59b71f9a40dfe99&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/webdriver.py
   **Line:** 283:301
   **Comment:**
        *Logic Error: Browser/context/page setup now happens before the 
protected `try` block, so failures in browser retrieval, context creation, 
auth, or page creation bypass the existing Playwright error handling and bubble 
up as uncaught task exceptions. Move this initialization inside the same 
guarded block (or add a dedicated outer handler) to preserve graceful 
logging/recovery behavior.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=e33b21830de55db4eae35032afc801af15b4d75adb141cb742125581b9bfb047&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=e33b21830de55db4eae35032afc801af15b4d75adb141cb742125581b9bfb047&reaction=dislike'>👎</a>



##########
superset/utils/webdriver.py:
##########
@@ -436,9 +477,9 @@ def get_screenshot(  # pylint: disable=too-many-locals, 
too-many-statements  # n
                 logger.exception(
                     "Encountered an unexpected error when requesting url %s", 
url
                 )
-            finally:
-                browser.close()
-            return img
+        finally:
+            context.close()
+        return img

Review Comment:
   **Suggestion:** `context.close()` is called unguarded in `finally`; if the 
browser/context is already broken (for example after a crash), close can raise 
and override the original flow (including successful screenshot return or 
intended logged error handling). Catch and suppress/log close errors in the 
cleanup path so teardown does not become the failing operation. [missing 
cleanup]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Cleanup exceptions can mask root cause of screenshot failures.
   - ⚠️ Successful screenshots may fail due to context.close errors.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger a thumbnail generation task such as `cache_dashboard_thumbnail` or
   `cache_chart_thumbnail` in `superset/tasks/thumbnails.py:37-44` and 
`95-113`, which flows
   through `BaseScreenshot.get_screenshot()` 
(`superset/utils/screenshots.py:32-41`) into
   `WebDriverPlaywright.get_screenshot()` 
(`superset/utils/webdriver.py:271-383`).
   
   2. During the Playwright interaction inside the inner `try`/`except` at
   `superset/utils/webdriver.py:301-380` (e.g., while waiting for elements or 
taking a
   screenshot), induce a browser/context failure (for example by killing the 
Chromium process
   or provoking a crash) so that the context is left in a broken state after an 
exception is
   raised and logged by the `except PlaywrightError` block at lines 75-80.
   
   3. After exiting the inner `try`, Python executes the `finally:` block at
   `superset/utils/webdriver.py:481-482`, which unconditionally calls 
`context.close()` on
   the broken Playwright context.
   
   4. When `context.close()` raises (typical once the browser or context is 
already crashed),
   this new exception from line 482 overrides both any original 
`PlaywrightError` or a
   successful `img` result, so Celery tasks see a failure originating from 
teardown rather
   than the real screenshot error, and a previously successful `img` would not 
be returned.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=699a63779deb4035ae7bd9e1f1483343&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=699a63779deb4035ae7bd9e1f1483343&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/webdriver.py
   **Line:** 481:482
   **Comment:**
        *Missing Cleanup: `context.close()` is called unguarded in `finally`; 
if the browser/context is already broken (for example after a crash), close can 
raise and override the original flow (including successful screenshot return or 
intended logged error handling). Catch and suppress/log close errors in the 
cleanup path so teardown does not become the failing operation.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=f8f5e6102d31518b7dd8d5a516e749990874a0a92fb58a89a5cc85f8812ee9a2&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=f8f5e6102d31518b7dd8d5a516e749990874a0a92fb58a89a5cc85f8812ee9a2&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to