[GitHub] [superset] ktmud commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

GitBox Sun, 30 Oct 2022 18:10:09 -0700


ktmud commented on issue #20630:
URL: https://github.com/apache/superset/issues/20630#issuecomment-1296418873

I like this! Changing `superset/{datasets,dashboard...}/{api,command,dao}`
to `superset/{api,commands,daos}/{dataset,dashboard,...}` would at least reduce
the amount of top-level folders and make it easier to navigate for people who
are just working on the APIs. And more often than not, you'd probably work on
APIs utilizing multiple models rather than working on both models and the API
of a single entity type at the same time.

I also like that the core SQL engine gets its own folder. It's long overdue.

One thing I'm not sure about is can we make it faster to find all models
that correspond to a db table? I assume they will all be moved to `dao/...`,
but is the nesting under `dao/{entity_type}/models.py` too deep still?

Another thing that has been bothering me is---there seems to be too many
boilerplates and parameters passing around between the API, the commands and
the DAO. Also, what is the boundary between a command and a DAO method anyway?
Currently it seems a very thick validation layer can be added to either the
[DAO](https://github.com/apache/superset/blob/bc435e08d01b87efcf8774f29a7078cee8988e39/superset/datasets/dao.py)
or a
[command](https://github.com/apache/superset/blob/4d192e6e4d74157c1eb8fed63df7ddaee4c8ecf7/superset/reports/commands/alert.py).

So I wonder if we can further simplify things down to something like this:

```
superset
├── api
│ ├── legacy
│ └── v1
├── views
│ └── legacy
│ ├── core.py
│ └── ...
├── tasks
│ └── ...
├── commands
│ ├── datasets
│ ├── dashboards
│ │ ├── create_dashboard.py
│ │ ├── get_dashboard_charts.py
│ │ └── ...
│ ├── reports
│ └── ...
├── engine
│ ├── specs
│ ├── query
│ │ ├── builder
│ │ ├── executor
│ │ └── results
├── models
│ ├── base.py
│ ├── chart.py
│ ├── dashboard.py
│ ├── dataset.py
│ ├── query.py
│ ├── report.py
│ └── ...
└── utils
```

A couple of things to note:

1. Input validation that does not require pulling data from db
([e.g.](https://github.com/apache/superset/blob/4d192e6e4d74157c1eb8fed63df7ddaee4c8ecf7/superset/reports/commands/base.py#L56-L66))
would stay with the API---ideally done by the API framework directly via
custom validators (using Marshmallow or
[any](https://docs.python-cerberus.org/en/stable/)
[other](https://pydantic-docs.helpmanual.io/usage/validators/) validation
library).
2. `commands` will include any data operations (query or mutation) that
require more than one model. Classes in `models` should try their best to not
import any other models unless the dependency is strictly one directional
(i.e., it's a 1-many relationship instead of many-to-many, e.g., dataset ->
columns).
3. Validations before data writes should happen privately within the data
writer. There should be no need to export the validation methods---so we can
avoid passing parameters around for too many layers.
4. `models` contains only classes that directly corresponds to a db
table--and the base classes and mixins that support such classes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] ktmud commented on issue #20630: [WIP] [SIP-92] Proposal for restructuring the Python code base

Reply via email to