richardfogaca commented on PR #38576:
URL: https://github.com/apache/superset/pull/38576#issuecomment-4803526935
PR #38576 — fix(thumbnails): add deduplication to dashboard thumbnail Celery
tasks
HEAD SHA: d05b963e7c522027bda94cd1c2f9a7e5d27ce02a
## Findings
### Medium / cleanup worth fixing
- [superset/utils/screenshots.py:280-281] `compute_and_cache` now imports
`DistributedLock` and `AcquireDistributedLockFailedException` inside the method
with only a pylint suppression. If this is not required for a real circular
import, moving these to module scope keeps the import graph explicit and avoids
the recurring import-outside-toplevel review issue.
Suggestion: move both imports to the top of
`superset/utils/screenshots.py`, or add a short comment naming the circular
dependency if the local import is intentional.
- [tests/unit_tests/tasks/test_thumbnails.py:133-134] The new cache-key
resolution test proves `get_cache_key()` is called, but it does not prove the
resolved key is passed to `compute_and_cache`. If line 112 regressed to
`cache_key=cache_key`, this test would still pass while the task-level lock
would operate on the wrong/default key path.
Suggestion: assert
`mock_screenshot.compute_and_cache.call_args.kwargs["cache_key"] ==
"resolved_cache_key"`.
### Low / PR text hygiene
- The PR body still says the API sets `COMPUTING` before enqueueing, while
the latest implementation intentionally moved the atomicity guarantee into
`compute_and_cache` via `DistributedLock`. Consider updating the summary so
future reviewers do not re-open the already-discussed API-level race point.
## Praise
- The current approach fixes the earlier self-skip issue by moving the
`COMPUTING` transition under the task lock, and the new tests cover lock-held
skip, stale `COMPUTING` retry, error caching, and stale image preservation.
## Validation reviewed
- GitHub PR files API at HEAD `d05b963e7c522027bda94cd1c2f9a7e5d27ce02a`
- Changed-file line anchors verified from GitHub contents at HEAD
- Preset/Superset Python review scans run across the six changed files; new
scan hits above, broad existing hits treated as pre-existing unless touched
- GitHub checks are green for the current head; I did not run local
pytest/runtime
## Paste-ready PR review body
Richard's agent here — I re-reviewed the current head `d05b963` after the
task-lock changes. No merge-blocking issues stood out to me; the earlier
self-skip path looks addressed by having `compute_and_cache` own the
`COMPUTING` transition under the distributed lock.
A couple of small follow-ups worth considering:
1. `superset/utils/screenshots.py:280-281` — small cleanup: could the new
`DistributedLock` / `AcquireDistributedLockFailedException` imports move to
module scope? If there is a real circular import, a short comment naming the
cycle would help; otherwise this looks like an import-outside-toplevel
suppression that reviewers often ask to remove.
2. `tests/unit_tests/tasks/test_thumbnails.py:133-134` — would it be worth
asserting that `compute_and_cache` receives `cache_key="resolved_cache_key"`?
The test currently proves `get_cache_key()` was called, but it would still pass
if the resolved key were not propagated into `compute_and_cache`.
Also minor PR-text hygiene: the body still describes API-level `COMPUTING`
marking before enqueue, while the latest design intentionally makes the API
check an optimization and puts correctness in the task lock. Updating that
summary would help future reviewers avoid re-opening the stale design point.
Nice improvement overall — the current tests around lock-held skip, stale
`COMPUTING`, error caching, and preserving the previous thumbnail make the
concurrency behavior much easier to follow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]