richardfogaca commented on PR #38576:
URL: https://github.com/apache/superset/pull/38576#issuecomment-4803526935

   PR #38576 — fix(thumbnails): add deduplication to dashboard thumbnail Celery 
tasks
   HEAD SHA: d05b963e7c522027bda94cd1c2f9a7e5d27ce02a
   
   ## Findings
   
   ### Medium / cleanup worth fixing
   - [superset/utils/screenshots.py:280-281] `compute_and_cache` now imports 
`DistributedLock` and `AcquireDistributedLockFailedException` inside the method 
with only a pylint suppression. If this is not required for a real circular 
import, moving these to module scope keeps the import graph explicit and avoids 
the recurring import-outside-toplevel review issue.
     Suggestion: move both imports to the top of 
`superset/utils/screenshots.py`, or add a short comment naming the circular 
dependency if the local import is intentional.
   
   - [tests/unit_tests/tasks/test_thumbnails.py:133-134] The new cache-key 
resolution test proves `get_cache_key()` is called, but it does not prove the 
resolved key is passed to `compute_and_cache`. If line 112 regressed to 
`cache_key=cache_key`, this test would still pass while the task-level lock 
would operate on the wrong/default key path.
     Suggestion: assert 
`mock_screenshot.compute_and_cache.call_args.kwargs["cache_key"] == 
"resolved_cache_key"`.
   
   ### Low / PR text hygiene
   - The PR body still says the API sets `COMPUTING` before enqueueing, while 
the latest implementation intentionally moved the atomicity guarantee into 
`compute_and_cache` via `DistributedLock`. Consider updating the summary so 
future reviewers do not re-open the already-discussed API-level race point.
   
   ## Praise
   - The current approach fixes the earlier self-skip issue by moving the 
`COMPUTING` transition under the task lock, and the new tests cover lock-held 
skip, stale `COMPUTING` retry, error caching, and stale image preservation.
   
   ## Validation reviewed
   - GitHub PR files API at HEAD `d05b963e7c522027bda94cd1c2f9a7e5d27ce02a`
   - Changed-file line anchors verified from GitHub contents at HEAD
   - Preset/Superset Python review scans run across the six changed files; new 
scan hits above, broad existing hits treated as pre-existing unless touched
   - GitHub checks are green for the current head; I did not run local 
pytest/runtime
   
   ## Paste-ready PR review body
   
   Richard's agent here — I re-reviewed the current head `d05b963` after the 
task-lock changes. No merge-blocking issues stood out to me; the earlier 
self-skip path looks addressed by having `compute_and_cache` own the 
`COMPUTING` transition under the distributed lock.
   
   A couple of small follow-ups worth considering:
   
   1. `superset/utils/screenshots.py:280-281` — small cleanup: could the new 
`DistributedLock` / `AcquireDistributedLockFailedException` imports move to 
module scope? If there is a real circular import, a short comment naming the 
cycle would help; otherwise this looks like an import-outside-toplevel 
suppression that reviewers often ask to remove.
   
   2. `tests/unit_tests/tasks/test_thumbnails.py:133-134` — would it be worth 
asserting that `compute_and_cache` receives `cache_key="resolved_cache_key"`? 
The test currently proves `get_cache_key()` was called, but it would still pass 
if the resolved key were not propagated into `compute_and_cache`.
   
   Also minor PR-text hygiene: the body still describes API-level `COMPUTING` 
marking before enqueue, while the latest design intentionally makes the API 
check an optimization and puts correctness in the task lock. Updating that 
summary would help future reviewers avoid re-opening the stale design point.
   
   Nice improvement overall — the current tests around lock-held skip, stale 
`COMPUTING`, error caching, and preserving the previous thumbnail make the 
concurrency behavior much easier to follow.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to