[I] SIP-99C] Proposal for model and business validation [superset]

via GitHub Wed, 01 Nov 2023 13:25:39 -0700


john-bodley opened a new issue, #25828:
URL: https://github.com/apache/superset/issues/25828


   ## [SIP-99C] Proposal for model and business validation
   
   This SIP is part of the [[SIP-99] Proposal for correctly handling business 
logic](https://github.com/apache/superset/issues/25048) series. Specifically, 
it proposes formalizing where and how model and business validation should 
occur.
   
   ### Validation
   
   There are typically four places where validation can occur:
   
   1. Database: via CHECK, FOREIGN KEY, or UNIQUE constraints, triggers, etc.
   2. SQLAlchemy ORM: via the 
[`validates()`](https://docs.sqlalchemy.org/en/20/orm/mapped_attributes.html#simple-validators)
 decorator.
   3. DAOs: via the various 
[`validate_*()`](https://github.com/search?q=repo%3Aapache%2Fsuperset+%22def+validate_%22+path%3A%2F%5Esuperset%5C%2Fdaos%5C%2F%2F&type=code)
 methods.
   3. Commands: via the 
[`BaseCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/commands/base.py#L37-L43)
 class method. 
   
   These are ranked and thus preference should be given—per the "shift left" 
approach—for validation to occur at the lowest level possible to ensure 
consistency, adherence of the DRY principle, etc.
   
   It's worth noting  that validation at the database level is nuanced given 
how different databases handle `NULL` values from a distinct uniqueness 
perspective. A great example of this is ensuring that a dataset is unique—in 
terms of the (`schema`, `table_name`) tuple—where an explicit UNIQUE constraint 
[is 
missing](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/connectors/sqla/models.py#L523-L530),
 given that (as mentioned),
   
   > The reason it does not physically exist is MySQL, PostgreSQL, etc. have a 
different interpretation of uniqueness when it comes to `NULL` which is 
problematic given the schema is optional.
   
   i.e., in MySQL the (`NULL`, `foo`) and (`NULL`, `foo`) tuples are perceived 
as being different, yet from a dataset perspective we would perceive these as 
being the same object.
   
   The main challenges with our current approach are:
   
   1. We tend to specify/enforce non-business validation logic at the DAO or 
Command level.
   2. We tend to "ask for permission" as opposed to "ask for forgiveness" which 
is both inefficient (due to the redundancy of first checking and then 
actioning) and—when combined with (1)—does not protect against a potential race 
condition.
   3. The `validate()` method tends to augment state—likely to address the 
inefficiency of (2)—which is an anti-pattern.
   
   #### Examples
   
   The following examples illustrate where the validation occurs at the wrong 
place:
   
   - The 
[FavStar](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/models/core.py#L1004-L1011)
 model has no uniqueness constraint on the (`user_id`, `class_name`, `obj_id`) 
tuple meaning that the uniqueness validation check is wrongfully handled by the 
[`ChartDAO.favorited_ids()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L39-L51)
 and 
[`DashboardDAO.favorited_ids()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L284-L296)
 methods when invoking 
[`ChartDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L53-L65),
 
[`DashboardDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L339-L351),
 etc.
   - The 
[Annotation](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/models/annotations.py#L40-L68)
 model has no uniqueness constraint on the `short_descr` column meaning that 
the uniqueness validation check is wrongfully handled in the 
[`AnnotationDAO.validate_update_uniqueness()`](https://github.com/apache/superset/blob/master/superset/daos/annotation.py#L27-L46)
 method.
   - The 
[`TagDAO.validate_tag_name()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/tag.py#L49-L55)
 method checks for invalid characters yet this can be handled by either a CHECK 
constraint or SQLAlchemy `validates()` decorator.
   - The 
[`DeleteChartCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/delete.py#L58-L74)
 method unnecessarily checks to see that there are no associated alerts or 
reports whereas it should rely solely on the 
[`ReportSchedule.chart_id`](https://github.com/apache/superset/blob/master/superset/reports/models.py#L132)
 FOREIGN KEY constraint.
   - There exists pairs of methods, i.e.,  
[`DashboardDAO.validate_slug_uniqueness()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L165-L170)
 and 
[`DashboardDAO.validate_update_slug_uniqueness()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L172-L179),
 for validating uniqueness used for creating and updating an entity 
respectively where the later (rightfully) includes an additional filter to 
exclude itself. This differentiation (and thus added complexity) is not 
required if there existed a corresponding UNIQUE constraint combined with the 
“ask for forgiveness” approach.
   
   The following examples illustrate where the validation checks could be 
violated:
   
   - The 
[`ChartDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L53C1-L65)
 method first checks whether the user has favorited said chart and if not 
creates the favorite as opposed to simply trying to favorite the chart and 
relying on a UNIQUE constraint to block (at the database level) the creation if 
it already exists.
   
   The following examples illustrate where validation augments state:
   
   - The 
[`UpdateDatabaseCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/databases/commands/update.py#L168-L182)
 method assigns the `self._model` attribute after fetching the model so it can 
be leveraged during the run phase.
   - The 
[`CreateChartCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/create.py#L55-L84)
 method assigns the `self._model` attribute and augments the 
`self._properities` attribute.
   - The 
[`ChartWarmUpCacheCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/warm_up_cache.py#L102-L108)
 method reassigns the `self._chart_or_id` attribute if the variable is an ID. 
Post validation the attribute name is misleading given it only pertains to a 
`Chart` object—hence 
[this](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/warm_up_cache.py#L50)
 confusing statement.
   
   ### Proposed Change
   
   - Validation should occur at the lowest possible (left most) level, given 
not all database operations originate at the DAO or Command level.
   - The DAO and Command should primarily be used to validate business logic, 
e.g., ownership, access controls, etc.
   - We should adopt a "ask for forgiveness" as opposed to "ask for permission" 
type approach by leveraging `try`/`except` blocks.
   - The `BaseCommand.validate()` method should be removed in favor of in-place 
validation, i.e., leveraging the “validate as you go” approach during the 
`run()` phase.
   
   #### Examples
   
   The following code illustrates how the model and DAO should be constructed, 
where it is evident that the model validation is handled at the database level 
and the DAO uses the “ask for forgiveness” approach*.
   
   ```python
   class FavStar(Model):
       __tablename__ = "favstar"
       __table_args__ = (UniqueConstraint("user_id", "class_name", "obj_id"),)
   
       id = Column(Integer, primary_key=True)
       user_id = Column(Integer, ForeignKey("ab_user.id"))
       class_name = Column(String(50))
       obj_id = Column(Integer)
   ```
   
   ```python
   class ChartDAO(BaseDAO[Slice]):
       def favorite(chart: Slice) -> None:
           try:
               db.session.begin_nested():
                   db.session.add(
                       FavStar(
                           class_name=FavStarClassName.CHART,
                           obj_id=chart.id,
                           user_id=get_user_id(),
                       )
                   )
           except IntegrityError:
               pass  # Already favorited.
   ```
   
   * The `try`/`except` approach is based on the fact that the code block will 
mostly succeed (which is the case in this example given one can really only 
favorite an object which is unfavorited). Additionally this approach removes 
the redundant check of first checking whether said item has already been 
favorited by the user.
   
   The following code illustrates the use of a CHECK constraint in conjunction 
with the “ask for forgiveness” approach.
   
   ```python
   class Tag(Model):
       __tablename__ = "tag"
       __table_args__ = (CheckConstraint(func.regexp_like(name, "^[:,]")),)
   
       id = Column(Integer, primary_key=True)
       name = Column(String(250), unique=True)
       type = Column(Enum(TagType))
       description = Column(Text)
   ```
   
   ```python
   TagDAO(BaseDAO[Tag]):
       def create(object_type: ObjectType, object_id: int, name: str) -> None:
           try:
               with db.session.begin_nested():
                   db.session.add(
                       TaggedObject(
                           object_id=object_id,
                           object_type=object_type,
                           tag=TagDAO.get_by_name(name, TagType.custom)
                       )
                   )
           except IntegrityError as ex:
               raise DAOCreateFailedError() from ex
   ```
   
   ### New or Changed Public Interfaces
   
   None.
   
   ### New Dependencies
   
   None.
   
   ### Migration Plan and Compatibility
   
   None.
   
   ### Rejected Alternatives
   
   None.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] SIP-99C] Proposal for model and business validation [superset]

Reply via email to