eschutho opened a new issue, #19953: URL: https://github.com/apache/superset/issues/19953
## [SIP-\<number>] Proposal for Chart creation without a dataset<title> ### Motivation Currently a user needs to create a dataset for each chart that they want to create. Many times these charts aren’t kept for long, usually either never making it to a dashboard or because someone just wants a quick view of their data to share for feedback or to gain insight into their own queries, tables, etc. A lot of new users don’t understand what a dataset is or why they need it. We want to allow people to progressively move into dataset usage, and allow them to create a chart quickly based on either a query, saved query, table or dataset. When they save we will prompt them to name a dataset, which will be a much lower barrier to visualizing their data quickly. ### Proposed Change Users should be able to create a chart from the chart page, from sql lab, or from a dataset. From explore or SQL Lab, they need to be able to view a chart, apply filters, see a list of columns in their query or table just as they do now, but without creating a dataset. If coming from a dataset view, they should be able to continue to use a dataset to back a chart as they can currently. This solution is based on this new approved flow: [https://github.com/apache/superset/discussions/18584](https://github.com/apache/superset/discussions/18584) 1st PR for chart creation with a query is here: [https://github.com/apache/superset/pull/19812/files](https://github.com/apache/superset/pull/19812/files) We will be [[creating a mixin](https://github.com/apache/superset/pull/19812)](https://github.com/apache/superset/pull/19812) as part of SIP 68 that contains all of the necessary functionality to power a chart. By extending that mixing to other models that have the necessary relationships (database, schema, columns) those models can also be used to power a chart. Our plan is to start with Queries and introduce them in a dark release. Next moving on to SavedQuery and then the new SL_table. We’ll finish up with the new SL_dataset and then once we confirmed and tested, we can start to point all of the charts over to the SL_dataset. Currently all `SqlaTable`s are dual writing to SL_dataset. We currently have two types of datasources in the config, SqlaTable (Dataset) and the Druid Datasource. If a chart connects to something, it should be a datasource. It follows in line with the methodology of what we are trying to achieve and doesn’t add in any complicated middle layers, and will be very extendable. With SIP 68 we will be removing Druid NoSQL Datasource and the datasource as a config and instead limiting the datasources to those classes that have the functionality needed to power a chart. As part of SIP 68, the datasource configs are going to be removed and datasources will no longer be configurable. We currently are working on converting the `ConnectorRegistry` which uses the configs to a [[DatasourceDAO](https://github.com/apache/superset/pull/19811)](https://github.com/apache/superset/pull/19811). - **Charts by Tables:** - Import/export - since a chart cannot be saved until it has a dataset, this is n/a for now - Explore/Dashboard view - When selecting a table as a datasource, we would create a `sl_table` instance and save it to the chart as a datasource. The `sl_table` would have all the column information needed to power the explore view. - On save, we just create the dataset to point to the already created Table. - SQL Lab to explore - This only applies to queries - **Charts by Queries :** - Import/export - since a chart cannot be saved until it has a dataset, this is n/a for now - Explore/Dashboard view - The `Query` would have a new relationship to `Column` (i.e., `sl_columns`) for all the column information needed to power the explore view. This has the potential to create a lot of duplicate columns for `Queries`. Another option, since a Query is immutable, is to save the columns in a json blob. - If it saves time/effort we are evaluating the possibility of not having cache for `Queries`. - SQL Lab to explore: - A chart will be linked to a `Query` from this flow. This is the only way that someone can create a chart from a `Query`. - On save, we create a dataset and add the query as the `expression` - **Charts by SavedQueries** - Import/export - since a chart cannot be saved until it has a dataset, this is n/a for now - Explore/Dashboard view - When selecting a SavedQuery as a datasource, we would tie that object to the chart. The SavedQuery would have a new relationship to `Column` (i.e., `sl_columns`) for all the column information needed to power the explore view. - On save, we create a dataset and add the query as the `expression` - SQL Lab to explore: - n/a - **Charts by Dataset:** - We need to update the old `SqlaTable` to a new `Sl_dataset` as part of SIP68. Everything else will be the same.` ### New or Changed Public Interfaces New UI flows are described here: https://github.com/apache/superset/discussions/18584 ### New dependencies Describe any `npm`/`PyPI` packages that are required. Are they actively maintained? What are their licenses? ### Migration Plan and Compatibility We will need to add columns to Queries and SavedQueries ### Rejected Alternatives 1) Create a temporary dataset without explicitly asking the user to do anything **Pros**: Simple for engineering, seamless, not much extra work. **Cons**: Users will see a bloated list of datasets in their dataset crud view and won’t know what they are. 1b. Mark these datasets as hidden and don’t show them on the CRUD page. **Pros**: Simple, easy to build. Users don’t see extra datasets. **Cons**: It gets complicated to have two different types of datasets, especially now that we are cleaning up the virtual vs physical. Now we would have hidden and visible, but we’re saying that the chart is backed by a query table, when in reality it’s not 2) Create a dataset just during the request cycle **Pros**: Doesn’t bloat the user’s CRUD list; There aren’t two types of datasets that we have to deal with **Cons**: It’s also complicated to create a dataset each time and could slow down performance, especially if we have to query their database too often. 3) Request the column data from the db each time we need that information **Pros**: We don’t need to store any extra data except on the client side. **Cons**: Poor performance, and could incur extra cost to the user for db usage. 4) Make a lightweight dataset by storing just column data in redis **Pros**: We don’t need to deal with any database models and/or database **Cons**: We are adding a separate middleware to the models when we don’t need to. Plus we would need to write up all of the logic for storing/retrieving the data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
