john-bodley opened a new issue, #25108:
URL: https://github.com/apache/superset/issues/25108

   ## [SIP-98B] Proposal for (re)defining a “unit of work”
   
   This SIP is part of the [[SIP-98] Proposal for correctly handling business 
logic](https://github.com/apache/superset/issues/25048) series. Specifically, 
it proposes formalizing the “unit of work” construct. The topics outlined here 
are highly coupled with [[SIP-98A] Primer for managing SQLAlchemy 
sessions](https://github.com/apache/superset/issues/25107) as they both relate 
to managing database transactions.
   
   ### Unit of Work
   
   The “unit of work” is used to group multiple service operations into a 
single atomic unit. It materializes as a database transaction which—with the 
support of a SQLAlchemy session—allows us to create a block of code within 
which database atomicity is guaranteed. If the block of code is successfully 
completed, the changes are committed to the database. If there is an exception, 
the changes are rolled back.
   
   SQLAlchemy also supports this 
[feature](https://www.sqlalchemy.org/features.html):
   
   > The Unit Of Work system, a central part of SQLAlchemy's Object Relational 
Mapper (ORM), organizes pending `INSERT`/`UPDATE`/`DELETE` operations into 
queues and flushes them all in one batch.
   
   Historically, Superset’s “unit of work” has been ill-defined, mismanaged,  
and/or misconstrued. It has resulted in us over (partial) committing*—a 
somewhat typical SQLAlchemy mistake (see [SIP-?A](...) for more details)—which 
violates the atomicity of the operation and leads to unnecessary complexity.
   
   \* The atomic unit vagueness has led to code inconsistencies which have 
resulted in us often adopting the “when in doubt, commit” mentality.
   
   [[SIP-35] Proposal for Improving Superset’s Python Code 
Organization](https://github.com/apache/superset/issues/9077) introduced the 
concept of a Command and Data Access Object (DAO) which (clearly and 
rightfully) stated,
   
   > Commands perform a single cohesive business operation, have a very small 
public API, and either entirely succeed or entirely fail. In practice, this 
requires that database transaction behavior be controlled at the Command level. 
When possible, commands should be modeled such that they perform a single 
database transaction. In the case of failure, they raise an exception or return 
an error code, and it is the responsibility of the caller to translate that 
into a correct response for a user.
   
   however, sadly the code examples (where the database logic is defined 
entirely within the DAO which—via the commit and rollback operations—ends the 
transaction) likely led people astray—violating the “unit of work” concept.
   
   #### Examples
   
   The following examples illustrate where the atomic business operation has 
been violated:
   
   - The 
[`BaseDAO.create()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/base.py#L131-L163)
 method auto commits by default, which means that the DAO, as opposed to the 
Command, acts as the “unit of work”.
   - The 
[`QueryDAO.stop_query()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/query.py#L79-L100)
 method over commits.
   - The 
[`ObjectUpdater.after_insert()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/tags/models.py#L164-L183)
 SQLAlchemy [event handler](https://docs.sqlalchemy.org/en/20/core/event.html) 
instantiates a session (and thus transaction) outside of the Flask-SQLAlchemy 
session and explicitly commits.
   - The 
[`UpdateDashboardCommand.run()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/dashboards/commands/update.py#L49-L66)
 method prevents command chaining due to the explicit commit.
   - The 
[`EmbeddedDashboardDAO.upsert()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/dashboard.py#L375C1-L387)
 method explicitly commits.
   
   The following examples illustrate where preserving the atomic business 
operation has added complexity or inconsistency which is difficult to grok:
   
   - The 
[`DatasetDao.update()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/dataset.py#L149-L172)
 method has complex commit chaining.
   - The 
[`DashboardDAO.update_chart_owners()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/dashboard.py#L183-L190)
 and 
[`DashboardDAO.set_dash_metadata()`](https://github.com/apache/superset/blob/60889d27edeeb306cff763743254ca0655faf4b5/superset/daos/dashboard.py#L192-L284)
 methods are inconsistent, i.e., (by default) the former and latter commit and 
do not commit respectively—which leads to ambiguity within the same DAO in 
terms of the meaning of `set` and `update`.
   
   ### Proposed Change
   
   #### Unit of Work
   
   In the context of Flask a typical "unit of work" is a request, however in 
addition to the RESTful API, Superset provides a CLI, and thus not all 
operations occur within the confines of a Flask request. 
   
   Per [SIP-35](https://github.com/apache/superset/issues/9077) Commands really 
are the best representation of a “unit of work” as they provide a single 
cohesive business operation. Note not all RESTful API endpoints interface with 
a Command and thus, under these circumstances, the RESTful API can serve as the 
de facto atomic unit.
   
   This proposal is inline with other 
[recommendations](https://stackoverflow.com/questions/773393/dao-pattern-where-do-transactions-fit-in)
 that state one should
   
   > ... decouple your business layer from any DAO technologies.
   
   and 
   
   > ... keep the control of the transaction in the business layer because 
[the] transaction is a business problem.
   
   #### Nested Transactions
   
   Using a Command (as opposed to a DAO) as the “unit of work” does not 
mitigate the over commit problem completely as it is conceivable that a 
business transaction may not be encapsulated by a single Command.
   
   To address this—without adding code complexity—we recommend leveraging 
SQLAlchemy's [nested 
transaction](https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#nested-transaction)
 which is both viable as _all_ our metadata engines—MySQL, PostgreSQL, and 
SQLite— support the `SAVEPOINT` construct and conducive to our design pattern.
   
   Used in conjunction with a context manager, upon exit, the outermost 
returned transaction object is either committed or rolled back to the 
`SAVEPOINT`. See 
[here](https://docs.sqlalchemy.org/en/20/orm/session_basics.html#framing-out-a-begin-commit-rollback-block)
 for the pseudo implementation.
   
   The merits of this approach are:
   
   - Ensures that the “unit of work” is indeed a single atomic unit.
   - Does not require an explicit `Session.flush()`, `Session.commit()`, or 
`Session.rollback()` as this is provided by the context manager. 
   - Does not require a priori knowledge of what constitutes a “unit of work” 
which is highly context-specific, i.e., the Command can be somewhat agnostic 
with regards to which nested level it represents—as long as it adheres to (2).
   
   #### Event Handlers
   
   Apart from asynchronous Celery tasks, only the Flask-SQLAlchemy singleton 
session should be used. Event handlers should share the same session of the 
mapped instance being persisted, i.e.,
   
   ```python
   @event.listens_for(Model, "after_insert")
   def after_insert(mapper: Mapper, connection: Connection, target: Model) -> 
None:
       session = inspect(target).session
       ...
   ```
   
   #### Testing
   
   Tests should leverage the nested transaction (with a twist). Typically tests 
should upon startup:
   
   - Create a new session
   - Use a nested transaction
   
   And upon teardown:
   
   - Rollback (as opposed to committing) the transaction (if possible)
   - Expunge all the objects from the session
   - Close the session
   
   A `pytest` fixture similar to [this](https://stackoverflow.com/a/38626139) 
(with function scope) achieves the desired functionality. Note the fixture uses 
the nested transcript construct sans context manager to ensure that the 
transaction is never explicitly committed.
   
   #### Examples
   
   The following code illustrates how the DAO and Command should be 
constructed, where it is evident that the Command (business layer) has control 
of the transaction.
   
   ```python
   class BaseDAO:
       @classmethod
       def update(cls, item: T, attributes: dict[str, Any]) -> T:    
           for key, value in attributes.items():
               setattr(item key, value)
       
           db.session.merge(item)
   ```
   
   ```python
   class UpdateDashboardCommand:
       def run(self, item: Dashboard, attributes: dict[str, Any]) -> Dashboard:
           try:
               with db.session.begin_nested():
                   dashboard = DashboardDAO.update(item, attributes)
                   ChartDAO.update_owners(dashboard)
           except SQLAlchemyError as ex:
               raise DashboardUpdateFailedError() from ex 
   ```
   
   SQLAlchemy event handlers typically either call `Connection.execute()` 
(which auto-commit) or instantiate a new session (and corresponding 
transaction)—both of which violate the atomic unit construct.
   
   Though one can obtain the session bound to the `target` (via 
`inspect(target).session`)—which should be the same as the Flask-SQLALchemy 
session—invoking additional database operations can be 
[problematic]([here](https://stackoverflow.com/questions/43164546/sawarning-usage-of-the-session-add-operation-is-not-currently-supported-wit).
   
   Possible resolutions include either instantiating a nested session within 
the event handler or migrating the various operations—which typically represent 
business logic—to a Command.
   
   ### Flask-AppBuilder
   
   Superset relies heavily on 
[Flask-AppBuilder]([https://github.com/dpgaspar/Flask-AppBuilder](https://github.com/dpgaspar/Flask-AppBuilder/blob/74f37e21a3c9c7ca7fb3e56f73759e3eaa2ead6b/flask_appbuilder/security/sqla/manager.py#L203-L235))
 (FAB) which, per this 
[`SecurityManager.add_user()`](https://github.com/dpgaspar/Flask-AppBuilder/blob/74f37e21a3c9c7ca7fb3e56f73759e3eaa2ead6b/flask_appbuilder/security/sqla/manager.py#L203-L235),
 has a tendency to explicitly commit, thus violating our definition of “unit of 
work”, i.e., the following,
   
   ```python
   with db.session.begin_nested():
       security_manager.add_user(...)
       db.session.rollback()
   ```
   
   will not roll back to the previous savepoint, as a simple rollback or commit 
updates the savepoints.
   
   There are a couple of options available to address this issue:
   
   1. Override the 
[`SecurityManager.get_session()`](https://github.com/dpgaspar/Flask-AppBuilder/blob/74f37e21a3c9c7ca7fb3e56f73759e3eaa2ead6b/flask_appbuilder/security/sqla/manager.py#L90-L92)
 property by providing a monkey patched session (associated with nested 
transaction) where the commit operation is a no-op.
   2. Have FAB use nested transactions removing the need to explicit commits.
   
   Though (2) is preferable, (1) can be implemented as follows:
   
   ```python
   class SupersetSecurityManager(SecurityManager):
       @property
       def get_session(self) -> Session:
           with db.session.begin_nested() as transaction:
               transaction.session.commit = lambda: None
               return transaction.session
   ```        
   
   ### New or Changed Public Interfaces
   
   None.
   
   ### New Dependencies
   
   May require an update to FAB to ensure that the atomic unit remains intact.
   
   ### Migration Plan and Compatibility
   
   - A wiki page will be created to specify the rules which should be adhered 
to.
   - A series of PRs to address existing rule violations.
   - New PRs (where possible) should adhere to the rules.
   
   None.
   
   ### Rejected Alternatives
   
   None.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to