rusackas opened a new pull request, #40663:
URL: https://github.com/apache/superset/pull/40663

   ### SUMMARY
   
   The metastore-backed key-value store records an `expires_on` timestamp for 
entries written with a timeout — for example, the `SupersetMetastoreCache` 
backend used by `filter_state` and `explore_form_data`. Unlike cache backends 
that evict on read (e.g. Redis), the SQL metastore does not remove rows on its 
own, so once an entry's TTL passes the row simply stays in the `key_value` 
table. Over time these expired rows accumulate and the table only grows.
   
   This adds routine housekeeping to clean them up:
   
   - **`KeyValuePruneCommand`** (`superset/key_value/commands/prune.py`) 
deletes entries whose `expires_on` is in the past, in batches, mirroring the 
existing `LogPruneCommand` / `TaskPruneCommand` shape (batched `IN`-clause 
deletes, optional `max_rows_per_run` cap, oldest-first deterministic ordering, 
progress logging). Entries with no expiry (`expires_on IS NULL`) are left 
untouched.
   - **`prune_key_value` Celery task** (`superset/tasks/scheduler.py`) wraps 
the command, mirroring the existing `prune_logs` / `prune_tasks` tasks.
   - **Commented-out beat schedule entry** in `superset/config.py`, following 
the exact convention already used for `prune_logs` and `prune_tasks` (opt-in, 
daily at midnight, with a `max_rows_per_run` kwarg).
   
   The expiry comparison uses naive `datetime.now()` to stay consistent with 
how the metastore cache writes `expires_on` (`datetime.now() + timedelta(...)`) 
and how `KeyValueEntry.is_expired()` already compares it.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   Not applicable.
   
   ### TESTING INSTRUCTIONS
   
   Unit tests added at `tests/unit_tests/key_value/prune_test.py` cover: 
expired rows deleted, non-expired and no-expiry rows retained, empty-store 
no-op, and the `max_rows_per_run` cap.
   
   ```
   pytest tests/unit_tests/key_value/prune_test.py
   ```
   
   To enable the scheduled prune in a deployment, uncomment the 
`prune_key_value` block in the `CeleryConfig.beat_schedule` in 
`superset/config.py`.
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to