john-bodley opened a new issue, #25828: URL: https://github.com/apache/superset/issues/25828
## [SIP-99C] Proposal for model and business validation This SIP is part of the [[SIP-99] Proposal for correctly handling business logic](https://github.com/apache/superset/issues/25048) series. Specifically, it proposes formalizing where and how model and business validation should occur. ### Validation There are typically four places where validation can occur: 1. Database: via CHECK, FOREIGN KEY, or UNIQUE constraints, triggers, etc. 2. SQLAlchemy ORM: via the [`validates()`](https://docs.sqlalchemy.org/en/20/orm/mapped_attributes.html#simple-validators) decorator. 3. DAOs: via the various [`validate_*()`](https://github.com/search?q=repo%3Aapache%2Fsuperset+%22def+validate_%22+path%3A%2F%5Esuperset%5C%2Fdaos%5C%2F%2F&type=code) methods. 3. Commands: via the [`BaseCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/commands/base.py#L37-L43) class method. These are ranked and thus preference should be given—per the "shift left" approach—for validation to occur at the lowest level possible to ensure consistency, adherence of the DRY principle, etc. It's worth noting that validation at the database level is nuanced given how different databases handle `NULL` values from a distinct uniqueness perspective. A great example of this is ensuring that a dataset is unique—in terms of the (`schema`, `table_name`) tuple—where an explicit UNIQUE constraint [is missing](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/connectors/sqla/models.py#L523-L530), given that (as mentioned), > The reason it does not physically exist is MySQL, PostgreSQL, etc. have a different interpretation of uniqueness when it comes to `NULL` which is problematic given the schema is optional. i.e., in MySQL the (`NULL`, `foo`) and (`NULL`, `foo`) tuples are perceived as being different, yet from a dataset perspective we would perceive these as being the same object. The main challenges with our current approach are: 1. We tend to specify/enforce non-business validation logic at the DAO or Command level. 2. We tend to "ask for permission" as opposed to "ask for forgiveness" which is both inefficient (due to the redundancy of first checking and then actioning) and—when combined with (1)—does not protect against a potential race condition. 3. The `validate()` method tends to augment state—likely to address the inefficiency of (2)—which is an anti-pattern. #### Examples The following examples illustrate where the validation occurs at the wrong place: - The [FavStar](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/models/core.py#L1004-L1011) model has no uniqueness constraint on the (`user_id`, `class_name`, `obj_id`) tuple meaning that the uniqueness validation check is wrongfully handled by the [`ChartDAO.favorited_ids()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L39-L51) and [`DashboardDAO.favorited_ids()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L284-L296) methods when invoking [`ChartDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L53-L65), [`DashboardDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L339-L351), etc. - The [Annotation](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/models/annotations.py#L40-L68) model has no uniqueness constraint on the `short_descr` column meaning that the uniqueness validation check is wrongfully handled in the [`AnnotationDAO.validate_update_uniqueness()`](https://github.com/apache/superset/blob/master/superset/daos/annotation.py#L27-L46) method. - The [`TagDAO.validate_tag_name()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/tag.py#L49-L55) method checks for invalid characters yet this can be handled by either a CHECK constraint or SQLAlchemy `validates()` decorator. - The [`DeleteChartCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/delete.py#L58-L74) method unnecessarily checks to see that there are no associated alerts or reports whereas it should rely solely on the [`ReportSchedule.chart_id`](https://github.com/apache/superset/blob/master/superset/reports/models.py#L132) FOREIGN KEY constraint. - There exists pairs of methods, i.e., [`DashboardDAO.validate_slug_uniqueness()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L165-L170) and [`DashboardDAO.validate_update_slug_uniqueness()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/dashboard.py#L172-L179), for validating uniqueness used for creating and updating an entity respectively where the later (rightfully) includes an additional filter to exclude itself. This differentiation (and thus added complexity) is not required if there existed a corresponding UNIQUE constraint combined with the “ask for forgiveness” approach. The following examples illustrate where the validation checks could be violated: - The [`ChartDAO.add_favorite()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/daos/chart.py#L53C1-L65) method first checks whether the user has favorited said chart and if not creates the favorite as opposed to simply trying to favorite the chart and relying on a UNIQUE constraint to block (at the database level) the creation if it already exists. The following examples illustrate where validation augments state: - The [`UpdateDatabaseCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/databases/commands/update.py#L168-L182) method assigns the `self._model` attribute after fetching the model so it can be leveraged during the run phase. - The [`CreateChartCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/create.py#L55-L84) method assigns the `self._model` attribute and augments the `self._properities` attribute. - The [`ChartWarmUpCacheCommand.validate()`](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/warm_up_cache.py#L102-L108) method reassigns the `self._chart_or_id` attribute if the variable is an ID. Post validation the attribute name is misleading given it only pertains to a `Chart` object—hence [this](https://github.com/apache/superset/blob/1e37f0b41782d35d68f657bfa87aeb9055d1e6e7/superset/charts/commands/warm_up_cache.py#L50) confusing statement. ### Proposed Change - Validation should occur at the lowest possible (left most) level, given not all database operations originate at the DAO or Command level. - The DAO and Command should primarily be used to validate business logic, e.g., ownership, access controls, etc. - We should adopt a "ask for forgiveness" as opposed to "ask for permission" type approach by leveraging `try`/`except` blocks. - The `BaseCommand.validate()` method should be removed in favor of in-place validation, i.e., leveraging the “validate as you go” approach during the `run()` phase. #### Examples The following code illustrates how the model and DAO should be constructed, where it is evident that the model validation is handled at the database level and the DAO uses the “ask for forgiveness” approach*. ```python class FavStar(Model): __tablename__ = "favstar" __table_args__ = (UniqueConstraint("user_id", "class_name", "obj_id"),) id = Column(Integer, primary_key=True) user_id = Column(Integer, ForeignKey("ab_user.id")) class_name = Column(String(50)) obj_id = Column(Integer) ``` ```python class ChartDAO(BaseDAO[Slice]): def favorite(chart: Slice) -> None: try: db.session.begin_nested(): db.session.add( FavStar( class_name=FavStarClassName.CHART, obj_id=chart.id, user_id=get_user_id(), ) ) except IntegrityError: pass # Already favorited. ``` * The `try`/`except` approach is based on the fact that the code block will mostly succeed (which is the case in this example given one can really only favorite an object which is unfavorited). Additionally this approach removes the redundant check of first checking whether said item has already been favorited by the user. The following code illustrates the use of a CHECK constraint in conjunction with the “ask for forgiveness” approach. ```python class Tag(Model): __tablename__ = "tag" __table_args__ = (CheckConstraint(func.regexp_like(name, "^[:,]")),) id = Column(Integer, primary_key=True) name = Column(String(250), unique=True) type = Column(Enum(TagType)) description = Column(Text) ``` ```python TagDAO(BaseDAO[Tag]): def create(object_type: ObjectType, object_id: int, name: str) -> None: try: with db.session.begin_nested(): db.session.add( TaggedObject( object_id=object_id, object_type=object_type, tag=TagDAO.get_by_name(name, TagType.custom) ) ) except IntegrityError as ex: raise DAOCreateFailedError() from ex ``` ### New or Changed Public Interfaces None. ### New Dependencies None. ### Migration Plan and Compatibility None. ### Rejected Alternatives None. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
