codeant-ai-for-open-source[bot] commented on code in PR #41243: URL: https://github.com/apache/superset/pull/41243#discussion_r3441469228
########## superset/utils/webdriver.py: ########## @@ -117,6 +118,48 @@ def check_playwright_availability() -> bool: PLAYWRIGHT_AVAILABLE = check_playwright_availability() +class _PlaywrightBrowserManager: + """Manages a long-lived Playwright browser instance per worker process. + + In Celery's prefork model, each worker process runs tasks sequentially, + so a single browser instance per process is safe and avoids the overhead + of launching/destroying Chromium on every screenshot task. Each task + creates a lightweight, isolated browser context instead of a full browser. + """ + + def __init__(self) -> None: + self._playwright: Any = None + self._browser: Any = None + + def get_browser(self, browser_args: list[str]) -> Any: + """Return a reusable browser, creating one if needed.""" + if self._browser is not None and self._browser.is_connected(): + return self._browser + + self._cleanup() + self._playwright = sync_playwright().start() + self._browser = self._playwright.chromium.launch(args=browser_args) Review Comment: **Suggestion:** `_PlaywrightBrowserManager.get_browser` can leak a started Playwright driver when Chromium launch fails: `sync_playwright().start()` is called before `chromium.launch(...)`, but there is no cleanup on launch exceptions. Wrap launch in a try/except and call `_cleanup()` (or `self._playwright.stop()`) before re-raising so failed launches do not leave a live Playwright process in the worker. [resource leak] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Thumbnail Celery tasks leak Playwright driver between failed launches. - ⚠️ Worker process holds unnecessary Playwright resources until next attempt. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Enable Playwright thumbnails so `BaseScreenshot.driver()` in `superset/utils/screenshots.py:12-20` returns `WebDriverPlaywright` when `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` is enabled and `PLAYWRIGHT_AVAILABLE` is True. 2. Configure `WEBDRIVER_OPTION_ARGS` (used at `superset/utils/webdriver.py:283-285`) with an invalid Chromium flag that causes `p.chromium.launch(args=browser_args)` to raise (for example, an unknown command-line option). 3. Trigger a thumbnail Celery task such as `cache_chart_thumbnail` in `superset/tasks/thumbnails.py:37-44`, which creates a `ChartScreenshot` and calls `BaseScreenshot.get_screenshot()` (`superset/utils/screenshots.py:32-41`), which in turn calls `WebDriverPlaywright.get_screenshot()` (`superset/utils/webdriver.py:271-283`). 4. Inside `get_screenshot()`, `_PlaywrightBrowserManager.get_browser()` (`superset/utils/webdriver.py:134-142`) executes `self._playwright = sync_playwright().start()` at line 140 and then raises from `chromium.launch(...)` at line 141; the exception propagates out without calling `_cleanup()`, so the started Playwright driver process remains running until either the worker later calls `get_browser()` again or the process exits, leaking resources between failures. ``` </details> [](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7c884d8f6f7f4d7b8e0680347e3cc34b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) [](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7c884d8f6f7f4d7b8e0680347e3cc34b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/utils/webdriver.py **Line:** 140:141 **Comment:** *Resource Leak: `_PlaywrightBrowserManager.get_browser` can leak a started Playwright driver when Chromium launch fails: `sync_playwright().start()` is called before `chromium.launch(...)`, but there is no cleanup on launch exceptions. Wrap launch in a try/except and call `_cleanup()` (or `self._playwright.stop()`) before re-raising so failed launches do not leave a live Playwright process in the worker. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=76f1a46e83c0ab866723abe043fb001367c8f6d63cd185fdfba70fb709c17adf&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=76f1a46e83c0ab866723abe043fb001367c8f6d63cd185fdfba70fb709c17adf&reaction=dislike'>👎</a> ########## superset/utils/webdriver.py: ########## @@ -237,26 +280,25 @@ def get_screenshot( # pylint: disable=too-many-locals, too-many-statements # n ) return None - with sync_playwright() as playwright: - browser_args = app.config["WEBDRIVER_OPTION_ARGS"] - browser = playwright.chromium.launch(args=browser_args) - pixel_density = app.config["WEBDRIVER_WINDOW"].get("pixel_density", 1) - viewport_height = self._window[1] - viewport_width = self._window[0] - context = browser.new_context( - bypass_csp=True, - viewport={ - "height": viewport_height, - "width": viewport_width, - }, - device_scale_factor=pixel_density, - ) - context.set_default_timeout( - app.config["SCREENSHOT_PLAYWRIGHT_DEFAULT_TIMEOUT"] - ) - if user: - self.auth(user, context) - page = context.new_page() + browser_args = app.config["WEBDRIVER_OPTION_ARGS"] + browser = _browser_manager.get_browser(browser_args) + pixel_density = app.config["WEBDRIVER_WINDOW"].get("pixel_density", 1) + viewport_height = self._window[1] + viewport_width = self._window[0] + context = browser.new_context( + bypass_csp=True, + viewport={ + "height": viewport_height, + "width": viewport_width, + }, + device_scale_factor=pixel_density, + ) + context.set_default_timeout(app.config["SCREENSHOT_PLAYWRIGHT_DEFAULT_TIMEOUT"]) + if user: + self.auth(user, context) + page = context.new_page() + img: bytes | None = None + try: Review Comment: **Suggestion:** Browser/context/page setup now happens before the protected `try` block, so failures in browser retrieval, context creation, auth, or page creation bypass the existing Playwright error handling and bubble up as uncaught task exceptions. Move this initialization inside the same guarded block (or add a dedicated outer handler) to preserve graceful logging/recovery behavior. [logic error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Thumbnail tasks crash on browser/context/auth creation errors. - ⚠️ Playwright initialization failures bypass existing error logging. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. With Playwright installed and `PLAYWRIGHT_REPORTS_AND_THUMBNAILS` enabled, a thumbnail task like `cache_chart_thumbnail` in `superset/tasks/thumbnails.py:37-44` builds a `ChartScreenshot` and calls `BaseScreenshot.get_screenshot()` in `superset/utils/screenshots.py:32-41`, which instantiates `WebDriverPlaywright` and invokes its `get_screenshot()` (`superset/utils/webdriver.py:271-283`). 2. In `WebDriverPlaywright.get_screenshot()`, the code at `superset/utils/webdriver.py:283-300` retrieves browser args, calls `_browser_manager.get_browser(browser_args)`, creates a new context via `browser.new_context(...)`, sets timeouts, optionally authenticates with `self.auth(user, context)`, and creates a new page via `context.new_page()`, all before the inner `try:` at line 301. 3. Cause an initialization error before the `try:` block, for example by having `machine_auth_provider_factory.instance.authenticate_browser_context()` (called via `self.auth(user, context)` at `superset/utils/webdriver.py:297-298`) raise due to a misconfigured machine auth provider, or by having Playwright raise from `browser.new_context(...)` when system resources are exhausted. 4. Observe that the exception from initialization is not caught by the `PlaywrightError` handler at `superset/utils/webdriver.py:75-80` (which only wraps the inner body after line 301); instead it propagates out to callers like `BaseScreenshot.get_screenshot()` and then to Celery tasks in `superset/tasks/thumbnails.py`, causing unhandled task failures without the standardized "Encountered an unexpected error when requesting url" logging that is applied to later Playwright operations. ``` </details> [](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=30e0e5678e3d4b2db59b71f9a40dfe99&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) [](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=30e0e5678e3d4b2db59b71f9a40dfe99&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/utils/webdriver.py **Line:** 283:301 **Comment:** *Logic Error: Browser/context/page setup now happens before the protected `try` block, so failures in browser retrieval, context creation, auth, or page creation bypass the existing Playwright error handling and bubble up as uncaught task exceptions. Move this initialization inside the same guarded block (or add a dedicated outer handler) to preserve graceful logging/recovery behavior. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=e33b21830de55db4eae35032afc801af15b4d75adb141cb742125581b9bfb047&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=e33b21830de55db4eae35032afc801af15b4d75adb141cb742125581b9bfb047&reaction=dislike'>👎</a> ########## superset/utils/webdriver.py: ########## @@ -436,9 +477,9 @@ def get_screenshot( # pylint: disable=too-many-locals, too-many-statements # n logger.exception( "Encountered an unexpected error when requesting url %s", url ) - finally: - browser.close() - return img + finally: + context.close() + return img Review Comment: **Suggestion:** `context.close()` is called unguarded in `finally`; if the browser/context is already broken (for example after a crash), close can raise and override the original flow (including successful screenshot return or intended logged error handling). Catch and suppress/log close errors in the cleanup path so teardown does not become the failing operation. [missing cleanup] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Cleanup exceptions can mask root cause of screenshot failures. - ⚠️ Successful screenshots may fail due to context.close errors. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Trigger a thumbnail generation task such as `cache_dashboard_thumbnail` or `cache_chart_thumbnail` in `superset/tasks/thumbnails.py:37-44` and `95-113`, which flows through `BaseScreenshot.get_screenshot()` (`superset/utils/screenshots.py:32-41`) into `WebDriverPlaywright.get_screenshot()` (`superset/utils/webdriver.py:271-383`). 2. During the Playwright interaction inside the inner `try`/`except` at `superset/utils/webdriver.py:301-380` (e.g., while waiting for elements or taking a screenshot), induce a browser/context failure (for example by killing the Chromium process or provoking a crash) so that the context is left in a broken state after an exception is raised and logged by the `except PlaywrightError` block at lines 75-80. 3. After exiting the inner `try`, Python executes the `finally:` block at `superset/utils/webdriver.py:481-482`, which unconditionally calls `context.close()` on the broken Playwright context. 4. When `context.close()` raises (typical once the browser or context is already crashed), this new exception from line 482 overrides both any original `PlaywrightError` or a successful `img` result, so Celery tasks see a failure originating from teardown rather than the real screenshot error, and a previously successful `img` would not be returned. ``` </details> [](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=699a63779deb4035ae7bd9e1f1483343&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) [](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=699a63779deb4035ae7bd9e1f1483343&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/utils/webdriver.py **Line:** 481:482 **Comment:** *Missing Cleanup: `context.close()` is called unguarded in `finally`; if the browser/context is already broken (for example after a crash), close can raise and override the original flow (including successful screenshot return or intended logged error handling). Catch and suppress/log close errors in the cleanup path so teardown does not become the failing operation. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=f8f5e6102d31518b7dd8d5a516e749990874a0a92fb58a89a5cc85f8812ee9a2&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41243&comment_hash=f8f5e6102d31518b7dd8d5a516e749990874a0a92fb58a89a5cc85f8812ee9a2&reaction=dislike'>👎</a> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
