codeant-ai-for-open-source[bot] commented on code in PR #38576:
URL: https://github.com/apache/superset/pull/38576#discussion_r3482475083


##########
superset/utils/screenshots.py:
##########
@@ -280,43 +281,66 @@ def compute_and_cache(  # pylint: 
disable=too-many-arguments
         :return: Image payload
         """
         cache_key = cache_key or self.get_cache_key(window_size, thumb_size)
-        cache_payload = self.get_from_cache_key(cache_key) or 
ScreenshotCachePayload()
-        if not cache_payload.should_trigger_task(force=force):
-            logger.info(
-                "Skipping compute - already processed for thumbnail: %s", 
cache_key
-            )
-            return
-
-        window_size = window_size or self.window_size
-        thumb_size = thumb_size or self.thumb_size
-        logger.info("Processing url for thumbnail: %s", cache_key)
-        cache_payload.computing()
-        image = None
-        # Assuming all sorts of things can go wrong with Selenium
         try:
-            logger.info("trying to generate screenshot")
-            with 
event_logger.log_context(f"screenshot.compute.{self.thumbnail_type}"):
-                image = self.get_screenshot(user=user, window_size=window_size)
-        except Exception as ex:  # pylint: disable=broad-except
-            logger.warning("Failed at generating thumbnail %s", ex, 
exc_info=True)
-            cache_payload.error()
-        if image and window_size != thumb_size:
-            try:
-                image = self.resize_image(image, thumb_size=thumb_size)
-            except Exception as ex:  # pylint: disable=broad-except
-                logger.warning("Failed at resizing thumbnail %s", ex, 
exc_info=True)
-                cache_payload.error()
+            with DistributedLock(
+                namespace="thumbnail",
+                key=cache_key,
+                ttl_seconds=app.config["THUMBNAIL_COMPUTING_CACHE_TTL"],
+            ):

Review Comment:
   **Suggestion:** The lock lease is fixed to `THUMBNAIL_COMPUTING_CACHE_TTL`, 
so if rendering exceeds that TTL the lease can expire and a second worker can 
enter the same critical section concurrently. Because lock release is 
unconditional in the distributed lock backend, the earlier worker can also 
delete a later worker's lock, further widening the race window. Use 
owner-token-based unlock semantics or refresh/extend the lease during 
long-running screenshot work. [race condition]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Thumbnail Celery tasks may run concurrently after lock expiry.
   - ⚠️ Distributed lock can delete a newer holder's Redis key.
   - ⚠️ Duplicate Selenium sessions undermine task-level deduplication.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A thumbnail task invokes `BaseScreenshot.compute_and_cache()` for a given 
`cache_key`
   at `superset/utils/screenshots.py:265-338`; the method acquires a 
distributed lock via
   `with DistributedLock(namespace="thumbnail", key=cache_key,
   ttl_seconds=app.config["THUMBNAIL_COMPUTING_CACHE_TTL"]):` at lines 285-289.
   
   2. The `DistributedLock` context manager in 
`superset/distributed_lock/__init__.py:28-64`
   calls `AcquireDistributedLock(namespace, params, ttl_seconds).run()`, which, 
when Redis
   coordination is configured, uses `AcquireDistributedLock._acquire_redis()` at
   `superset/commands/distributed_lock/acquire.py:88-99` to execute
   `redis_client.set(self.redis_lock_key, "1", nx=True, ex=self.ttl_seconds)`, 
creating a
   fixed-TTL Redis key without any owner token.
   
   3. If the screenshot and resize work in `compute_and_cache()` 
(selenium/playwright
   rendering plus `resize_image`) runs longer than 
`THUMBNAIL_COMPUTING_CACHE_TTL`
   (configured as 360 seconds at `superset/config.py:1133-1135`), Redis will 
automatically
   expire the lock key while worker A is still inside the `with 
DistributedLock(...)` block.
   A second worker B starting `compute_and_cache()` for the same `cache_key` 
after expiry
   will successfully acquire the same lock key and enter the same critical 
section
   concurrently.
   
   4. When worker A eventually exits the `with` block, `DistributedLock` calls
   `ReleaseDistributedLock(namespace, params).run()` (see
   `superset/distributed_lock/__init__.py:60-64`), which in the Redis backend 
executes
   `_release_redis()` at `superset/commands/distributed_lock/release.py:53-57`, 
performing an
   unconditional `redis_client.delete(self.redis_lock_key)`. This can delete 
worker B's
   newly-acquired lock and widen the race window, allowing overlapping Selenium 
sessions and
   breaking the intended "only-one-worker-per-thumbnail" guarantee.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=58e3d22b35cf48dbb7f947eac23d9955&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=58e3d22b35cf48dbb7f947eac23d9955&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/screenshots.py
   **Line:** 285:289
   **Comment:**
        *Race Condition: The lock lease is fixed to 
`THUMBNAIL_COMPUTING_CACHE_TTL`, so if rendering exceeds that TTL the lease can 
expire and a second worker can enter the same critical section concurrently. 
Because lock release is unconditional in the distributed lock backend, the 
earlier worker can also delete a later worker's lock, further widening the race 
window. Use owner-token-based unlock semantics or refresh/extend the lease 
during long-running screenshot work.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=1ce20a82f6f3401a33f6ee114145b53f3a21f85369444114e73e2325ee17e741&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=1ce20a82f6f3401a33f6ee114145b53f3a21f85369444114e73e2325ee17e741&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to