john-bodley opened a new issue, #25107:
URL: https://github.com/apache/superset/issues/25107

   ## [SIP-98A] Primer on managing SQLAlchemy sessions
   
   This SIP is part of the [[SIP-98] Proposal for correctly handling business 
logic](https://github.com/apache/superset/issues/25048) series. Specifically, 
it serves as a primer—reiterating best practices—for working with SQLAlchemy 
sessions as these were typically mismanaged. The topics outlined here are 
highly coupled with [[SIP-98B] Proposal for (re)defining a “unit of work”](...) 
as they both relate to managing database transactions.
   
   ### Session Management 
   
   The [SQLAlchemy 
session](https://docs.sqlalchemy.org/en/20/orm/session_basics.html)—which 
establishes all conversations with the database—is managed by 
[Flask-SQLAlchemy](https://github.com/pallets-eco/flask-sqlalchemy) which also 
handles cleaning up connections and sessions after each 
[request](https://github.com/pallets-eco/flask-sqlalchemy/blob/564a96f2ee07819aac2dcca7d9eb58dcfeb1765f/src/flask_sqlalchemy/extension.py#L379-L386).
 
   
   The preconfigured scoped session (called `db.session`) should be used for 
the lifetime of a request to ensure there is only one “virtual” 
transaction—guaranteeing that the “unit of work” is indeed atomic 
([SIP-98B](...)).
   
   The SQLAlchemy [Managing 
Transactions](https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#managing-transactions)
 documentation states:
   
   > The Session tracks the state of a single “virtual" transaction at a time. 
This “virtual" transaction is created automatically when needed, or can 
alternatively be started using the `Session.begin()` method. `Session.commit()` 
ends the transaction.
   
   The transaction is typically framed as follows, 
   
   ```python
   try:
       db.session.add(Dashboard(slug="foo"))
       db.session.commit()
   except SQLAlchemyError:
       db.session.rollback()
       raise
   ```
   
   though, via leveraging a context manager, the `Session.begin()` method 
serves as a demarcated transaction and provides the same sequence of operations:
   
   ```python
   with db.session.begin():
       db.session.add(Dashboard(slug="foo"))
   ```
   
   #### Session States
   
   There are [five 
states](https://docs.sqlalchemy.org/en/20/orm/session_state_management.html) 
which an object can have within a session:
   
   - Transient
   - Pending
   - Persisted
   - Deleted
   - Detached
   
   The session tracks objects that are either new, dirty, or deleted.
   
   The 
[`Session.flush()`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.flush)
 method communicates a series of operations to the database (`INSERT`, 
`UPDATE`, `DELETE`). The database maintains them as pending operations in a 
transaction. The changes are not persisted permanently to disk, or visible to 
other transactions until the database receives a `COMMIT` statement via the 
[`Session.commit()`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.commit)
 method—thus ending the transaction. If `Session.commit()` is called, then 
`Session.flush()` is called automatically.
   
   The 
[`Session.merge()`](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.Session.merge)
 convenience method copies the state of a given instance into a corresponding 
instance within the session.
   
   > `Session.merge()` examines the primary key attributes of the source 
instance, and attempts to reconcile it with an instance of the same primary key 
in the session. If not found locally, it attempts to load the object from the 
database based on the primary key, and if none can be located, creates a new 
instance. The state of each attribute on the source instance is then copied to 
the target instance.
   
   Without a primary key `Session.merge()` is akin to `Session.add()`,
   
   ```python
   db.session.merge(Dashboard(slug="foo"))
   db.session.flush()  # INSERT INTO dashboards ...
   ```
   
   ```python
   db.session.merge(Dashboard(id=1, slug="foo"))  # SELECT * FROM dashboards 
WHERE id = ?
   db.session.flush()  # UPDATE dashboards SET ...
   ```
   
   Note the use of the `Session.merge()` method is often misinterpreted, i.e., 
a persistent object does not need to be (re)merged. If the object is dirty the 
pending changes will be persisted during a subsequent flush.
   
   
   #### Transient
   
   An object that is not in the session. 
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> inspect(dashboard).transient
   True
   >>> db.session.new
   IdentitySet([])
   >>> db.session.add(dashboard)
   >>> db.session.new
   IdentitySet([Dashboard<foo>])
   >>> inspect(dashboard).transient
   False
   >>> db.session.remove()
   ```
   
   #### Pending
   
   An object which has been added to the session but has not yet been flushed 
to the database.
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> db.session.add(dashboard)
   >>> inspect(dashboard).pending
   True
   >>> db.session.new
   IdentitySet([Dashboard<foo>])
   >>> db.session.flush()  # INSERT INTO dashboards ...
   >>> db.session.new
   IdentitySet([])
   >>> inspect(dashboard).pending
   False
   >>> db.session.remove()
   ```
   
   Note that constraints will only be checked when the object is flushed.
   
   ```python
   >>> db.session.add_all([Dashboard(slug="bar"), Dashboard(slug="bar")])
   >>> db.session.flush()  # INSERT INTO dashboards ...
   sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint 
failed: dashboards.slug
   >>> db.session.remove()
   ```
   
   #### Persistent
   
   An object which is present in the session and has a corresponding record in 
the database. The persisted state occurs either during 
[flushing](https://docs.sqlalchemy.org/en/20/orm/session_basics.html#flushing) 
(which is invoked by `Session.merge()`) or querying. 
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> db.session.add(dashboard)
   >>> db.session.flush()  # INSERT INTO dashboards ...
   >>> inspect(dashboard).persistent
   True
   >>> db.session.dirty
   IdentitySet([])
   >>> dashboard.slug = "bar"  
   >>> db.session.dirty
   IdentitySet([Dashboard<1>])
   >>> db.session.flush()  # UPDATE dashboards SET ...
   >>> db.session.dirty
   IdentitySet([])
   >>> db.session.remove()
   ```
   
   It is worth noting that objects within the session, in either pending* or 
persisted state, can be queried albeit not having been committed to the 
database. We have a tendency to over commit (be that in the code or tests) 
resulting in fractured/partial atomic units. Typically working with objects in 
a persisted state is sufficient.
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> db.session.add(dashboard)
   >>> inspect(dashboard).pending
   True
   >>> dashboard is db.session.query(Dashboard).filter(Dashboard.slug == 
"foo").one()
   >>> True
   >>> inspect(dashboard).persistent
   True
   >>> db.session.remove()
   ```
   
   \* The `Session.query()` method invoked a `Session.flush()` which meant that 
the pending object became persisted.
   
   #### Deleted
   
   An object which has been deleted within a flush, but the transaction has not 
yet completed.
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> db.session.add(dashboard)
   >>> db.session.flush()  # INSERT INTO dashboards ...
   >>> inspect(dashboard).deleted
   False
   >>> db.session.deleted
   IdentitySet([])
   >>> db.session.delete(dashboard)
   >>> db.session.deleted
   IdentitySet([Dashboard<foo>])
   >>> db.session.flush()  # DELETE FROM dashboards WHERE id = ?
   >>> inspect(dashboard).deleted
   True
   >>> db.session.remove()
   ```
   
   #### Detached
   
   An object that corresponds (or previously corresponded) to a record in the 
database, but is not currently in any session.
   
   ```python
   >>> dashboard = Dashboard(slug="foo")
   >>> db.session.add(dashboard)
   >>> db.session.flush()  # INSERT INTO dashboards ...
   >>> inspect(dashboard).detached
   >>> False
   >>> db.session.remove()
   >>> inspect(dashboard).detached
   True
   ```
   
   #### Examples
   
   The following examples illustrate where the session was mismanaged:
   
   - The 
[`TableSchemaView.expanded()`](https://github.com/apache/superset/blob/3630d6844c0f4668f7196beadd744e582c9219bd/superset/views/sql_lab/views.py#L217-L228)
 method unnecessarily commits even though the function only fetches, i.e., 
there are no new, dirty, or deleted objects in the session.
   - The 
[`UpdateKeyValueCommand.update()`](https://github.com/apache/superset/blob/f584c8462b6d72f902a59e5f72a0f4603de4da1a/superset/key_value/commands/update.py#L77-L92)
 method unnecessarily merges the dirty `event` object even though it was 
already persisted—per the `Session.query()` call—in the session.
   - The 
[test_get_expired_entry](https://github.com/apache/superset/blob/f584c8462b6d72f902a59e5f72a0f4603de4da1a/tests/integration_tests/key_value/commands/get_test.py#L66-L82)
 test unnecessarily commits the `entry` object to the database given that the 
[`GetKeyValueCommand.run()`](https://github.com/apache/superset/blob/f584c8462b6d72f902a59e5f72a0f4603de4da1a/superset/key_value/commands/get.py#L58-L62)
 method leverages the same Flask-SQLAlchemy session. This requires additional 
work to then delete the object from the database to ensure that the test 
remains idempotent.
   
   ### Proposed Change
   
   None beyond adhering to best practices when managing SQLAlchemy sessions—via 
the Flask-SQLAlchemy singleton scoped session.
   
   ### New or Changed Public Interfaces
   
   None.
   
   ### New Dependencies
   
   None.
   
   ### Migration Plan and Compatibility
   
   - A wiki page will be created to specify the rules which should be adhered 
to.
   - A series of PRs to address existing rule violations.
   - New PRs (where possible) should adhere to the rules.
   
   None.
   
   ### Rejected Alternatives
   
   None.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to