codeant-ai-for-open-source[bot] commented on code in PR #38576:
URL: https://github.com/apache/superset/pull/38576#discussion_r3482298974


##########
superset/utils/screenshots.py:
##########
@@ -280,43 +281,66 @@ def compute_and_cache(  # pylint: 
disable=too-many-arguments
         :return: Image payload
         """
         cache_key = cache_key or self.get_cache_key(window_size, thumb_size)
-        cache_payload = self.get_from_cache_key(cache_key) or 
ScreenshotCachePayload()
-        if not cache_payload.should_trigger_task(force=force):
-            logger.info(
-                "Skipping compute - already processed for thumbnail: %s", 
cache_key
-            )
-            return
-
-        window_size = window_size or self.window_size
-        thumb_size = thumb_size or self.thumb_size
-        logger.info("Processing url for thumbnail: %s", cache_key)
-        cache_payload.computing()
-        image = None
-        # Assuming all sorts of things can go wrong with Selenium
         try:
-            logger.info("trying to generate screenshot")
-            with 
event_logger.log_context(f"screenshot.compute.{self.thumbnail_type}"):
-                image = self.get_screenshot(user=user, window_size=window_size)
-        except Exception as ex:  # pylint: disable=broad-except
-            logger.warning("Failed at generating thumbnail %s", ex, 
exc_info=True)
-            cache_payload.error()
-        if image and window_size != thumb_size:
-            try:
-                image = self.resize_image(image, thumb_size=thumb_size)
-            except Exception as ex:  # pylint: disable=broad-except
-                logger.warning("Failed at resizing thumbnail %s", ex, 
exc_info=True)
-                cache_payload.error()
+            with DistributedLock(
+                namespace="thumbnail",
+                key=cache_key,
+                ttl_seconds=app.config["THUMBNAIL_COMPUTING_CACHE_TTL"],
+            ):
+                cache_payload = (
+                    self.get_from_cache_key(cache_key) or 
ScreenshotCachePayload()
+                )
+                if not cache_payload.should_trigger_task(force=force):
+                    logger.info(
+                        "Skipping compute - already processed for thumbnail: 
%s",
+                        cache_key,
+                    )
+                    return
+
+                window_size = window_size or self.window_size
+                thumb_size = thumb_size or self.thumb_size
+                logger.info("Processing url for thumbnail: %s", cache_key)
+                cache_payload.computing()
+                self.cache.set(cache_key, cache_payload.to_dict())
                 image = None
-
-        # Cache the result (success or error) to avoid immediate retries
-        if image:
-            with 
event_logger.log_context(f"screenshot.cache.{self.thumbnail_type}"):
-                cache_payload.update(image)
-
-        logger.info("Caching thumbnail: %s", cache_key)
-        self.cache.set(cache_key, cache_payload.to_dict())
-        logger.info("Updated thumbnail cache; Status: %s", 
cache_payload.get_status())
-        return
+                # Assuming all sorts of things can go wrong with Selenium
+                try:
+                    logger.info("trying to generate screenshot")
+                    with event_logger.log_context(
+                        f"screenshot.compute.{self.thumbnail_type}"
+                    ):
+                        image = self.get_screenshot(user=user, 
window_size=window_size)
+                except Exception as ex:  # pylint: disable=broad-except
+                    logger.warning(
+                        "Failed at generating thumbnail %s", ex, exc_info=True
+                    )
+                    cache_payload.error()
+                if image and window_size != thumb_size:
+                    try:
+                        image = self.resize_image(image, thumb_size=thumb_size)
+                    except Exception as ex:  # pylint: disable=broad-except
+                        logger.warning(
+                            "Failed at resizing thumbnail %s", ex, 
exc_info=True
+                        )
+                        cache_payload.error()
+                        image = None
+
+                if image:
+                    with event_logger.log_context(
+                        f"screenshot.cache.{self.thumbnail_type}"
+                    ):
+                        cache_payload.update(image)
+
+                logger.info("Caching thumbnail: %s", cache_key)
+                self.cache.set(cache_key, cache_payload.to_dict())
+                logger.info(
+                    "Updated thumbnail cache; Status: %s", 
cache_payload.get_status()
+                )
+        except AcquireDistributedLockFailedException:
+            logger.info(
+                "Skipping duplicate thumbnail task for %s - lock already held",
+                cache_key,
+            )

Review Comment:
   **Suggestion:** This handler swallows every 
`AcquireDistributedLockFailedException`, but lock acquisition raises that same 
exception both for normal lock contention and for Redis/backend failures. As 
written, infrastructure failures are silently treated as "duplicate task" and 
the thumbnail is never generated or retried, causing persistent 202 responses 
without recovery. Only suppress true lock-contention cases and re-raise 
backend/acquire errors so Celery can retry and operators can see failures. 
[incorrect condition logic]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Dashboard thumbnail endpoint can return 202 indefinitely.
   - ⚠️ Celery thumbnail tasks hide Redis/DB lock failures.
   - ⚠️ Operators lose visibility into lock backend outages.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the dashboard thumbnail endpoint `GET 
/api/v1/dashboard/<pk>/thumbnail/<digest>/`
   implemented in `superset/dashboards/api.py:1584-1699`. The handler constructs
   `DashboardScreenshot` and a `cache_key`, then loads `ScreenshotCachePayload` 
from cache or
   creates a new one and checks `should_trigger_task()` at
   `superset/dashboards/api.py:75-85`.
   
   2. When `cache_payload.should_trigger_task()` returns True (status PENDING 
or stale), the
   API enqueues the Celery task `cache_dashboard_thumbnail` via `.delay(...)` at
   `superset/dashboards/api.py:86-95`, which executes 
`superset/tasks/thumbnails.py:77-114`.
   Inside this task, `DashboardScreenshot.compute_and_cache(...)` is invoked at
   `superset/tasks/thumbnails.py:104-114`.
   
   3. `compute_and_cache()` in `superset/utils/screenshots.py:26-105` wraps its 
body in
   `DistributedLock(namespace="thumbnail", key=cache_key,
   ttl_seconds=THUMBNAIL_COMPUTING_CACHE_TTL)` at lines 45-50. The 
`DistributedLock` context
   manager calls `AcquireDistributedLock.run()` which in turn either 
`_acquire_redis` or
   `_acquire_kv` at `superset/distributed_lock/__init__.py:52-60` and
   `superset/commands/distributed_lock/acquire.py:29-35`.
   
   4. If the lock backend fails (e.g., Redis connection error in 
`_acquire_redis`, caught and
   re-raised as `AcquireDistributedLockFailedException` at
   `superset/commands/distributed_lock/acquire.py:57-60`, or a DB/KeyValue 
error re-raised as
   `AcquireDistributedLockFailedException` via the `@transaction(on_error=...,
   reraise=AcquireDistributedLockFailedException)` decorator at
   `superset/commands/distributed_lock/acquire.py:63-72`), 
`compute_and_cache()` catches this
   in the `except AcquireDistributedLockFailedException:` block at
   `superset/utils/screenshots.py:100-104` and only logs "Skipping duplicate 
thumbnail task
   for %s - lock already held" before returning. The Celery task completes 
successfully with
   no cache update, leaving the status as PENDING; subsequent calls to the 
thumbnail API
   continue to enqueue tasks that immediately hit the same backend failure and 
are silently
   treated as duplicates, so clients persistently receive 202 responses 
indicating async
   computation without any thumbnail ever being generated or retried.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=cf1d807b33cd451d8fe4585185ff8fad&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=cf1d807b33cd451d8fe4585185ff8fad&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/screenshots.py
   **Line:** 339:343
   **Comment:**
        *Incorrect Condition Logic: This handler swallows every 
`AcquireDistributedLockFailedException`, but lock acquisition raises that same 
exception both for normal lock contention and for Redis/backend failures. As 
written, infrastructure failures are silently treated as "duplicate task" and 
the thumbnail is never generated or retried, causing persistent 202 responses 
without recovery. Only suppress true lock-contention cases and re-raise 
backend/acquire errors so Celery can retry and operators can see failures.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=653f93898c1d57423a76e3a90231827bfd3704a50a87652bae178ae02c5cd1d5&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=653f93898c1d57423a76e3a90231827bfd3704a50a87652bae178ae02c5cd1d5&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to