willbarrett opened a new issue #9077: [SIP-35] Proposal for Improving Superset’s Python Code Organization URL: https://github.com/apache/incubator-superset/issues/9077 ## [SIP] Proposal for Improving Superset’s Python Code Organization ### Motivation As I was in the weeds fixing a bunch of Pylint failures, Max and I started going back and forth on this PR: https://github.com/apache/incubator-superset/pull/8777, which we ultimately closed. The root cause of that was a lack of shared understanding on the best code structure for Superset going forward. Talking with others at Preset, I realized that the issue was larger than just a new contributor not understanding project practices. Without a shared understanding, we are lacking a cohesive approach towards refactoring Superset - we need a technical North Star for project structure. Preset’s developers met and identified a number of pain points in the existing code base. Here they are, with a bit of color to make the meaning clear: * Lack of separation of concerns * There are a number of modules that have grown organically over time to truly massive size. Often these modules contain utility functions, classes of multiple types, and a broad array of other functionality. These modules are very hard to work with. * Business logic is frequently intermixed with display logic, making code reuse more difficult. * The model layer contains a large amount of business logic which has nothing to do with data storage. * Dependency Confusion * Circular dependency issues crop up regularly with the existing module design * Many modules have implicit dependencies on other modules which requires a specific load order * Lack of Composability * There’s a substantial amount of app-specific code in shared modules * Views are currently loaded in all contexts, which makes light-weight command line usage (for example) impossible * Deep Class Inheritance * In some instances, particularly where there is pre-existing inheritance in Flask-AppBuilder, the inheritance chain can be over 10 classes deep. This can make understanding some classes very challenging. ### Proposed Change In order to address these concerns, we’d like to propose a few guiding principles for Python code structure going forward: * Prefer small, functionally-homogeneous modules with only classes and pure functions in module scope * Separate business logic from display logic * Keep the view layer lightweight and delegate to business logic objects * To support this work, we’d like to recommend a Command pattern - more on that below * Prefer thin ORM objects * To support this work, we’d like to recommend introducing Data Access Objects or DAOs to Superset to hold complicated query logic. More on that below. * Keep `__init__.py` files empty * This will help cut down on implicit dependencies * Prefer composition over inheritance * Collaborator objects can be simpler to understand than a deeply-nested class hierarchy. * Organize modules function and back-end objects rather than products * Prefer: * Dashboard * Chart * Query * Database * Models * api * Avoid: * SQL Lab * Explore #### New patterns to introduce: ##### Command: There are multiple patterns named Command, and the one we reference here is most similar to an Alembic migration. Commands perform a single cohesive business operation, have a very small public API, and either entirely succeed or entirely fail. In practice, this requires that database transaction behavior be controlled at the Command level. When possible, commands should be modeled such that they perform a single database transaction. In the case of failure, they raise an exception or return an error code, and it is the responsibility of the caller to translate that into a correct response for a user. Example command: ```python # This is obviously not a full finished command. The point # is to illustrate the structure of a command as # encapsulating a business operation, and illustrate # a likely public API. import DashboardDAO from daos.dashboard class MissingDashboardError(CommandError): pass class UpdateFailedError(CommandError): pass class InvalidDashboardError(CommandError): pass class UpdateDashboardCommand < Command: def __init__(current_user, dashboard_id, dashboard_properties): self.actor = current_user self.dashboard_id = dashboard_id self.dashboard_properties = dashboard_properties def run(): self.dashboard = DashboardDAO.find_dashboard(actor, dashboard_id) if self.dashboard IS NONE: raise MissingDashboardError("Dashboard could not be found") if !self._is_dashboard_valid(): raise InvalidDashboardError("Dashboard parameters are invalid.") update_result = DashboardDAO.update(dashboard, dashboard_properties) if update_result != True: raise UpdateFailedError("Dashboard could not be updated.") def _is_dashboard_valid(): # Check validity of dashboard data before save ``` ##### DAO (Data Access Object): A DAO in the context of Superset would be an object that would manage the lifecycle of SQLAlchemy models. Custom queries would be implemented in DAOs and then called from Command objects. In comparison to Command objects, DAOs have relatively broad public interfaces. DAOs should not depend on other DAOs to avoid circular dependencies. If results are needed cross-DAO, that should be orchestrated in the Command. Here’s a sample simplified DAO for illustrative purposes: ```python # This is not a fully finished DAO. The intention # is to illustrate the public interface of a # Data Access Object, and how it might interface # with a SQLAlchemy model. from superset import db import DataAccessObject from daos.base import Dashboard from models.dashboard class DashboardDAO(DataAccessObject): @staticmethod def find_for_update(actor, dashboard_id): dashboard = db.session.query(Dashboard).filter(id=dashboard_id).one_or_none() if dashboard if actor in dashboard.owners: return dashboard else return None else return None @staticmethod def update(dashboard, dashboard_params): for key, value in dashboard_params.iteritems(): setattr(dashboard, key, value) try: db.session.merge(dashboard) db.session.commit() except SQLAlchemyError as e: logger.error('Failed to update dashboard: ?', e.message) db.session.rollback() return False return True ``` #### Proposed example package structure that follows the above principles: * superset/dashboards/api.py * Contains FAB and Flask endpoints related to the public API * superset/dashboards/schemas.py * Contains marshmallow schemas used by both the api and view endpoints * superset/dashboards/boot.py * Contains initialization and configuration code for API endpoints * Invoked by `create_app()` * superset/dashboards/views.py * Contains FAB and Flask endpoints related to page-refresh views. * superset/dashboards/dao.py * superset/dashboards/commands/add_user.py * superset/dashboards/commands/create.py * superset/dashboards/helpers.py * superset/dashboards/models/dashboard.py * Models are thin SQLAlchemy objects - they do not implement custom queries or filters. That functionality will be found in the DAO, which manipulates the underlying SQLAlchemy model. * superset/charts/api.py * superset/charts/daos.py * superset/charts/boot.py * superset/charts/schemas.py * superset/charts/commands/add_to_dashboard.py * superset/charts/commands/create.py * superset/charts/helpers.py * superset/charts/models/chart.py In this design, all systems related to a specific back-end resource have been grouped under a top-level folder. `__init__.py` files should be left empty to enable only pulling in the portions of the system necessary for a specific entrypoint (Celery shouldn’t need `api.py` or `views.py` for instance) ### New or Changed Public Interfaces Over time, the internals of Superset will evolve towards the new structure. Public HTTP interfaces will not be likely to change as a result of the above proposal, but code will move and alter to conform. This will impact organizations that apply their own customizations to Superset. ### New dependencies None ### Migration Plan and Compatibility Introduce refactors to existing code at a manageable pace to allow organizations relying on Superset internals time to adapt. ### Rejected Alternatives Preset discussed Service Objects as an alternative to both Commands and DAOs. We felt that Commands provided easier entrypoints for the ports of our application (API endpoints, views, command line invocations, Celery tasks) than Service Objects, and that introducing DAOs as well helped further break down concerns. We also considered structuring top-level folders by function (api, models, etc.) but found this resulted in drastically more Python modules overall without substantially simplifying the question of where code should live. ### Individuals consulted in creating this SIP @mistercrunch @craig-rueda @dpgaspar @robdiciuccio @suddjian @nytai
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
