[PR] fix(key-value): use flush instead of commit [superset]

via GitHub Tue, 18 Jun 2024 01:10:13 -0700


villebro opened a new pull request, #29286:
URL: https://github.com/apache/superset/pull/29286


   ### SUMMARY
   In the key value commands, we are often returning the keys from newly 
generated ORM objects, which are persisted in the `key_value` table. As we're 
currently using `commit()` to persist the objects prior to reading the keys, 
the session is reset, requiring a new session for fetching the keys. This can 
cause issues when distributing reads to read replicas, as the data may 
sometimes not be available:
   
   ```
   sqlalchemy.orm.exc.ObjectDeletedError: Instance '<KeyValueEntry at 
0x7fe508293280>' has been deleted, or its row is otherwise not present.
   ```
   
   Furthermore, since we often delete expired records before inserting new 
ones, collisions can occur if the commands aren't handled in the same session. 
For this reason we should avoid committing when flushing is sufficient, as 
stated in [SIP-99A](#25107):
   
   > It is worth noting that objects within the session, in either pending* or 
persisted state, can be queried albeit not having been committed to the 
database. We have a tendency to over commit (be that in the code or tests) 
resulting in fractured/partial atomic units. Typically working with objects in 
a persisted state is sufficient.
   
   While fixing these, I also noticed a case where we're purging expired 
entries in the Metastore cache's `add` method _after_ inserting a new entry. 
This has the potential of causing a key conflict, as a previous entry with the 
same key may already be present but expired (note that in the `get` key value 
command we're filtering out expired entries: 
https://github.com/apache/superset/blob/c7b8ae9013dcbffb3bb0dda4dc1d57cc9c18048c/superset/commands/key_value/get.py#L70-L72)
   
   Finally, as we're replacing `commit()` with `flush()`, we need to ensure 
that transactions are scoped correctly. Usually we want to encapsulate many 
commands in a single transaction. For example, for the create/upsert commands, 
we usually want to do the following in a single transaction:
   1. delete expired entries
   2. insert new entry
   
   However, there are instances where we explicitly need for commits to happen 
during execution flow. An good example of this is the Key Value Distributed 
Lock, which needs to commit the lock to the metastore so other workers can 
observe it. For this reason we're adding explicit `db.session.commit()` calls 
both in the Metastore Cache and Lock context manager to ensure the values are 
persisted in the db during execution flow.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(key-value): use flush instead of commit [superset]

Reply via email to