villebro opened a new pull request, #29286: URL: https://github.com/apache/superset/pull/29286
### SUMMARY In the key value commands, we are often returning the keys from newly generated ORM objects, which are persisted in the `key_value` table. As we're currently using `commit()` to persist the objects prior to reading the keys, the session is reset, requiring a new session for fetching the keys. This can cause issues when distributing reads to read replicas, as the data may sometimes not be available: ``` sqlalchemy.orm.exc.ObjectDeletedError: Instance '<KeyValueEntry at 0x7fe508293280>' has been deleted, or its row is otherwise not present. ``` Furthermore, since we often delete expired records before inserting new ones, collisions can occur if the commands aren't handled in the same session. For this reason we should avoid committing when flushing is sufficient, as stated in [SIP-99A](#25107): > It is worth noting that objects within the session, in either pending* or persisted state, can be queried albeit not having been committed to the database. We have a tendency to over commit (be that in the code or tests) resulting in fractured/partial atomic units. Typically working with objects in a persisted state is sufficient. While fixing these, I also noticed a case where we're purging expired entries in the Metastore cache's `add` method _after_ inserting a new entry. This has the potential of causing a key conflict, as a previous entry with the same key may already be present but expired (note that in the `get` key value command we're filtering out expired entries: https://github.com/apache/superset/blob/c7b8ae9013dcbffb3bb0dda4dc1d57cc9c18048c/superset/commands/key_value/get.py#L70-L72) Finally, as we're replacing `commit()` with `flush()`, we need to ensure that transactions are scoped correctly. Usually we want to encapsulate many commands in a single transaction. For example, for the create/upsert commands, we usually want to do the following in a single transaction: 1. delete expired entries 2. insert new entry However, there are instances where we explicitly need for commits to happen during execution flow. An good example of this is the Key Value Distributed Lock, which needs to commit the lock to the metastore so other workers can observe it. For this reason we're adding explicit `db.session.commit()` calls both in the Metastore Cache and Lock context manager to ensure the values are persisted in the db during execution flow. ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF <!--- Skip this if not applicable --> ### TESTING INSTRUCTIONS <!--- Required! What steps can be taken to manually verify the changes? --> ### ADDITIONAL INFORMATION <!--- Check any relevant boxes with "x" --> <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue --> - [ ] Has associated issue: - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
