This is an automated email from the ASF dual-hosted git repository. sfirke pushed a commit to branch sfirke-add-arch-page in repository https://gitbox.apache.org/repos/asf/superset.git
commit a536cdca0f57613da4cdb8a2e25af056f24e0a90 Author: Sam Firke <[email protected]> AuthorDate: Mon May 13 13:57:15 2024 -0400 Create architecture.mdx --- docs/docs/installation/architecture.mdx | 55 +++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/docs/docs/installation/architecture.mdx b/docs/docs/installation/architecture.mdx new file mode 100644 index 0000000000..b789647452 --- /dev/null +++ b/docs/docs/installation/architecture.mdx @@ -0,0 +1,55 @@ +--- +title: Architecture +hide_title: true +sidebar_position: 1 +version: 1 +--- + +import useBaseUrl from "@docusaurus/useBaseUrl"; + +# Architecture + +This page is meant to give new administrators an understanding of Superset's components. + +## Components + +A Superset installation is made up of these components: +1. The Superset application itself +2. A metadata database to store Superset's data about users, charts, dashboards, etc. +3. A caching layer (optional, but required for some features) +4. A worker & beat (optional, but required for some features) + +No matter how you install - Kubernetes, Docker Compose, from PyPI - these are the components. Here are further details on each. + +### The Superset Application + +This is the core application. Superset operates like this: +- A user visits a chart or dashboard +- That triggers a SQL query to the data warehouse holding the underlying dataset +- The resulting data is served up in a data visualization + +### Metadata Database + +This is where chart and dashboard definitions, user information, logs, etc. are stored. Superset is tested to work with PostgreSQL and MySQL databases as the metadata database (not be confused with a data source like your data warehouse, which could be a much greater variety of options like Snowflake, Redshift, etc.). + +Some installation methods like our Quickstart and PyPI come configured by default to use a SQLite on-disk database. And in a Docker Compose installation, the data would be stored in a PostgresQL container volume. Neither of these cases are recommended for production instances of Superset. + +For production, a properly-configured, managed, standalone database is recommended. No matter what database you use, you should plan to back it up regularly. + +### Caching Layer + +The caching layer serves two main functions: +- Store the results of queries to your data warehouse so that when a chart is loaded twice, it pulls from the cache the second time, speeding up the application and reducing load on your data warehouse. +- Act as a message broker for the worker, enabling the Alerts & Reports, async queries, and thumbnail caching features. + +Most people use Redis for their cache, but Superset supports other options too. See the [cache docs](/docs/configuration/cache/) for more. + +### Worker and Beat + +This is one or more workers who execute tasks like run async queries or take snapshots of reports and send emails, and a "beat" that acts as the scheduler and tells workers when to perform their tasks. Most installations use Celery for these components. + +## Other components + +Other components can be incorporated into Superset. The best place to learn about additional configurations is the [Configuration page](docs/configuration/configuring-superset). For instance, you could set up a load balancer or reverse proxy to implement HTTPS in front of your Superset application, or specify a Mapbox URL to enable geospatial charts, etc. + +Superset won't even start without certain configuration settings established, so it's essential to review that page.
