codeant-ai-for-open-source[bot] commented on code in PR #38576:
URL: https://github.com/apache/superset/pull/38576#discussion_r3482475083
##########
superset/utils/screenshots.py:
##########
@@ -280,43 +281,66 @@ def compute_and_cache( # pylint:
disable=too-many-arguments
:return: Image payload
"""
cache_key = cache_key or self.get_cache_key(window_size, thumb_size)
- cache_payload = self.get_from_cache_key(cache_key) or
ScreenshotCachePayload()
- if not cache_payload.should_trigger_task(force=force):
- logger.info(
- "Skipping compute - already processed for thumbnail: %s",
cache_key
- )
- return
-
- window_size = window_size or self.window_size
- thumb_size = thumb_size or self.thumb_size
- logger.info("Processing url for thumbnail: %s", cache_key)
- cache_payload.computing()
- image = None
- # Assuming all sorts of things can go wrong with Selenium
try:
- logger.info("trying to generate screenshot")
- with
event_logger.log_context(f"screenshot.compute.{self.thumbnail_type}"):
- image = self.get_screenshot(user=user, window_size=window_size)
- except Exception as ex: # pylint: disable=broad-except
- logger.warning("Failed at generating thumbnail %s", ex,
exc_info=True)
- cache_payload.error()
- if image and window_size != thumb_size:
- try:
- image = self.resize_image(image, thumb_size=thumb_size)
- except Exception as ex: # pylint: disable=broad-except
- logger.warning("Failed at resizing thumbnail %s", ex,
exc_info=True)
- cache_payload.error()
+ with DistributedLock(
+ namespace="thumbnail",
+ key=cache_key,
+ ttl_seconds=app.config["THUMBNAIL_COMPUTING_CACHE_TTL"],
+ ):
Review Comment:
**Suggestion:** The lock lease is fixed to `THUMBNAIL_COMPUTING_CACHE_TTL`,
so if rendering exceeds that TTL the lease can expire and a second worker can
enter the same critical section concurrently. Because lock release is
unconditional in the distributed lock backend, the earlier worker can also
delete a later worker's lock, further widening the race window. Use
owner-token-based unlock semantics or refresh/extend the lease during
long-running screenshot work. [race condition]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Thumbnail Celery tasks may run concurrently after lock expiry.
- ⚠️ Distributed lock can delete a newer holder's Redis key.
- ⚠️ Duplicate Selenium sessions undermine task-level deduplication.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. A thumbnail task invokes `BaseScreenshot.compute_and_cache()` for a given
`cache_key`
at `superset/utils/screenshots.py:265-338`; the method acquires a
distributed lock via
`with DistributedLock(namespace="thumbnail", key=cache_key,
ttl_seconds=app.config["THUMBNAIL_COMPUTING_CACHE_TTL"]):` at lines 285-289.
2. The `DistributedLock` context manager in
`superset/distributed_lock/__init__.py:28-64`
calls `AcquireDistributedLock(namespace, params, ttl_seconds).run()`, which,
when Redis
coordination is configured, uses `AcquireDistributedLock._acquire_redis()` at
`superset/commands/distributed_lock/acquire.py:88-99` to execute
`redis_client.set(self.redis_lock_key, "1", nx=True, ex=self.ttl_seconds)`,
creating a
fixed-TTL Redis key without any owner token.
3. If the screenshot and resize work in `compute_and_cache()`
(selenium/playwright
rendering plus `resize_image`) runs longer than
`THUMBNAIL_COMPUTING_CACHE_TTL`
(configured as 360 seconds at `superset/config.py:1133-1135`), Redis will
automatically
expire the lock key while worker A is still inside the `with
DistributedLock(...)` block.
A second worker B starting `compute_and_cache()` for the same `cache_key`
after expiry
will successfully acquire the same lock key and enter the same critical
section
concurrently.
4. When worker A eventually exits the `with` block, `DistributedLock` calls
`ReleaseDistributedLock(namespace, params).run()` (see
`superset/distributed_lock/__init__.py:60-64`), which in the Redis backend
executes
`_release_redis()` at `superset/commands/distributed_lock/release.py:53-57`,
performing an
unconditional `redis_client.delete(self.redis_lock_key)`. This can delete
worker B's
newly-acquired lock and widen the race window, allowing overlapping Selenium
sessions and
breaking the intended "only-one-worker-per-thumbnail" guarantee.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=58e3d22b35cf48dbb7f947eac23d9955&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=58e3d22b35cf48dbb7f947eac23d9955&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/screenshots.py
**Line:** 285:289
**Comment:**
*Race Condition: The lock lease is fixed to
`THUMBNAIL_COMPUTING_CACHE_TTL`, so if rendering exceeds that TTL the lease can
expire and a second worker can enter the same critical section concurrently.
Because lock release is unconditional in the distributed lock backend, the
earlier worker can also delete a later worker's lock, further widening the race
window. Use owner-token-based unlock semantics or refresh/extend the lease
during long-running screenshot work.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=1ce20a82f6f3401a33f6ee114145b53f3a21f85369444114e73e2325ee17e741&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38576&comment_hash=1ce20a82f6f3401a33f6ee114145b53f3a21f85369444114e73e2325ee17e741&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]