willbarrett opened a new issue #9077: [SIP-35] Proposal for Improving 
Superset’s Python Code Organization
URL: https://github.com/apache/incubator-superset/issues/9077
 
 
   ## [SIP] Proposal for Improving Superset’s Python Code Organization
   
   ### Motivation
   
   As I was in the weeds fixing a bunch of Pylint failures, Max and I started 
going back and forth on this PR: 
https://github.com/apache/incubator-superset/pull/8777, which we ultimately 
closed. The root cause of that was a lack of shared understanding on the best 
code structure for Superset going forward. Talking with others at Preset, I 
realized that the issue was larger than just a new contributor not 
understanding project practices. Without a shared understanding, we are lacking 
a cohesive approach towards refactoring Superset - we need a technical North 
Star for project structure.
   
   Preset’s developers met and identified a number of pain points in the 
existing code base. Here they are, with a bit of color to make the meaning 
clear:
   
   * Lack of separation of concerns
       * There are a number of modules that have grown organically over time to 
truly massive size. Often these modules contain utility functions, classes of 
multiple types, and a broad array of other functionality. These modules are 
very hard to work with.
       * Business logic is frequently intermixed with display logic, making 
code reuse more difficult.
       * The model layer contains a large amount of business logic which has 
nothing to do with data storage.
   * Dependency Confusion
       * Circular dependency issues crop up regularly with the existing module 
design
       * Many modules have implicit dependencies on other modules which 
requires a specific load order
    * Lack of Composability
       * There’s a substantial amount of app-specific code in shared modules
       * Views are currently loaded in all contexts, which makes light-weight 
command line usage (for example) impossible
   * Deep Class Inheritance
       * In some instances, particularly where there is pre-existing 
inheritance in Flask-AppBuilder, the inheritance chain can be over 10 classes 
deep. This can make understanding some classes very challenging.
   
   
   ### Proposed Change
   
   In order to address these concerns, we’d like to propose a few guiding 
principles for Python code structure going forward:
   
   * Prefer small, functionally-homogeneous modules with only classes and pure 
functions in module scope
   * Separate business logic from display logic
       * Keep the view layer lightweight and delegate to business logic objects
       * To support this work, we’d like to recommend a Command pattern - more 
on that below
   * Prefer thin ORM objects
       * To support this work, we’d like to recommend introducing Data Access 
Objects or DAOs to Superset to hold complicated query logic. More on that below.
   *  Keep `__init__.py` files empty
       * This will help cut down on implicit dependencies
   * Prefer composition over inheritance
       * Collaborator objects can be simpler to understand than a deeply-nested 
class hierarchy.
   * Organize modules function and back-end objects rather than products
       * Prefer:
           * Dashboard
           * Chart
           * Query
           * Database
           * Models
           * api
       * Avoid:
           * SQL Lab
           * Explore
   
   #### New patterns to introduce:
   
   ##### Command:
   
   There are multiple patterns named Command, and the one we reference here is 
most similar to an Alembic migration. Commands perform a single cohesive 
business operation, have a very small public API, and either entirely succeed 
or entirely fail. In practice, this requires that database transaction behavior 
be controlled at the Command level. When possible, commands should be modeled 
such that they perform a single database transaction. In the case of failure, 
they raise an exception or return an error code, and it is the responsibility 
of the caller to translate that into a correct response for a user.
   
   Example command:
   ```python
   # This is obviously not a full finished command. The point
   # is to illustrate the structure of a command as
   # encapsulating a business operation, and illustrate
   # a likely public API.
   
   import DashboardDAO from daos.dashboard
   
   class MissingDashboardError(CommandError):
     pass
   
   class UpdateFailedError(CommandError):
     pass
   
   class InvalidDashboardError(CommandError):
     pass
   
   class UpdateDashboardCommand < Command:
     def __init__(current_user, dashboard_id, dashboard_properties):
       self.actor = current_user
       self.dashboard_id = dashboard_id
       self.dashboard_properties = dashboard_properties
       
     def run():
       self.dashboard = DashboardDAO.find_dashboard(actor, dashboard_id)
       if self.dashboard IS NONE:
         raise MissingDashboardError("Dashboard could not be found")
         
       if !self._is_dashboard_valid():
         raise InvalidDashboardError("Dashboard parameters are invalid.")
       
       update_result = DashboardDAO.update(dashboard, dashboard_properties)
       
       if update_result != True:
         raise UpdateFailedError("Dashboard could not be updated.")
         
     
     def _is_dashboard_valid():
       # Check validity of dashboard data before save
   ```
   
   ##### DAO (Data Access Object):
   
   A DAO in the context of Superset would be an object that would manage the 
lifecycle of SQLAlchemy models. Custom queries would be implemented in DAOs and 
then called from Command objects. In comparison to Command objects, DAOs have 
relatively broad public interfaces. DAOs should not depend on other DAOs to 
avoid circular dependencies. If results are needed cross-DAO, that should be 
orchestrated in the Command. Here’s a sample simplified DAO for illustrative 
purposes:
   
   ```python
   # This is not a fully finished DAO. The intention
   # is to illustrate the public interface of a 
   # Data Access Object, and how it might interface
   # with a SQLAlchemy model.
   
   from superset import db
   import DataAccessObject from daos.base
   import Dashboard from models.dashboard
   
   class DashboardDAO(DataAccessObject):
     
     @staticmethod
     def find_for_update(actor, dashboard_id):
       dashboard = 
db.session.query(Dashboard).filter(id=dashboard_id).one_or_none()
       if dashboard
         if actor in dashboard.owners:
           return dashboard
         else
           return None
       else
         return None
     
     @staticmethod
     def update(dashboard, dashboard_params):
       for key, value in dashboard_params.iteritems():
         setattr(dashboard, key, value)
       try:
         db.session.merge(dashboard)
         db.session.commit()
       except SQLAlchemyError as e:
         logger.error('Failed to update dashboard: ?', e.message)
         db.session.rollback()
         return False
         
       return True
   ```
   
   #### Proposed example package structure that follows the above principles:
   
   * superset/dashboards/api.py
       * Contains FAB and Flask endpoints related to the public API
   * superset/dashboards/schemas.py
       * Contains marshmallow schemas used by both the api and view endpoints
   * superset/dashboards/boot.py
       * Contains initialization and configuration code for API endpoints
       * Invoked by `create_app()`
   * superset/dashboards/views.py
        * Contains FAB and Flask endpoints related to page-refresh views.
   * superset/dashboards/dao.py
   * superset/dashboards/commands/add_user.py
   * superset/dashboards/commands/create.py
   * superset/dashboards/helpers.py
   * superset/dashboards/models/dashboard.py
       * Models are thin SQLAlchemy objects - they do not implement custom 
queries or filters. That functionality will be found in the DAO, which 
manipulates the underlying SQLAlchemy model.
   * superset/charts/api.py
   * superset/charts/daos.py
   * superset/charts/boot.py
   * superset/charts/schemas.py
   * superset/charts/commands/add_to_dashboard.py
   * superset/charts/commands/create.py
   * superset/charts/helpers.py
   * superset/charts/models/chart.py
   
   In this design, all systems related to a specific back-end resource have 
been grouped under a top-level folder. `__init__.py` files should be left empty 
to enable only pulling in the portions of the system necessary for a specific 
entrypoint (Celery shouldn’t need `api.py` or `views.py` for instance)
   
   ### New or Changed Public Interfaces
   
   Over time, the internals of Superset will evolve towards the new structure. 
Public HTTP interfaces will not be likely to change as a result of the above 
proposal, but code will move and alter to conform. This will impact 
organizations that apply their own customizations to Superset.
   
   ### New dependencies
   
   None
   
   ### Migration Plan and Compatibility
   
   Introduce refactors to existing code at a manageable pace to allow 
organizations relying on Superset internals time to adapt.
   
   ### Rejected Alternatives
   
   Preset discussed Service Objects as an alternative to both Commands and 
DAOs. We felt that Commands provided easier entrypoints for the ports of our 
application (API endpoints, views, command line invocations, Celery tasks) than 
Service Objects, and that introducing DAOs as well helped further break down 
concerns.
   
   We also considered structuring top-level folders by function (api, models, 
etc.) but found this resulted in drastically more Python modules overall 
without substantially simplifying the question of where code should live.
   
   
   ### Individuals consulted in creating this SIP
   @mistercrunch @craig-rueda @dpgaspar @robdiciuccio @suddjian @nytai 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to