eschutho opened a new issue, #19953:
URL: https://github.com/apache/superset/issues/19953

   ## [SIP-\<number>] Proposal for Chart creation without a dataset<title>
   
   ### Motivation
   
   Currently a user needs to create a dataset for each chart that they want to 
create. Many times these charts aren’t kept for long, usually either never 
making it to a dashboard or because someone just wants a quick view of their 
data to share for feedback or to gain insight into their own queries, tables, 
etc. A lot of new users don’t understand what a dataset is or why they need it. 
We want to allow people to progressively move into dataset usage, and allow 
them to create a chart quickly based on either a query, saved query, table or 
dataset. When they save we will prompt them to name a dataset, which will be a 
much lower barrier to visualizing their data quickly.
   
   ### Proposed Change
   
   Users should be able to create a chart from the chart page, from sql lab, or 
from a dataset. From explore or SQL Lab, they need to be able to view a chart, 
apply filters, see a list of columns in their query or table just as they do 
now, but without creating a dataset. If coming from a dataset view, they should 
be able to continue to use a dataset to back a chart as they can currently. 
   
   This solution is based on this new approved flow: 
[https://github.com/apache/superset/discussions/18584](https://github.com/apache/superset/discussions/18584)
   
   1st PR for chart creation with a query is here: 
[https://github.com/apache/superset/pull/19812/files](https://github.com/apache/superset/pull/19812/files)
   
   We will be [[creating a 
mixin](https://github.com/apache/superset/pull/19812)](https://github.com/apache/superset/pull/19812)
 as part of SIP 68 that contains all of the necessary functionality to power a 
chart. By extending that mixing to other models that have the necessary 
relationships (database, schema, columns) those models can also be used to 
power a chart. Our plan is to start with Queries and introduce them in a dark 
release. Next moving on to SavedQuery and then the new SL_table. We’ll finish 
up with the new SL_dataset and then once we confirmed and tested, we can start 
to point all of the charts over to the SL_dataset. Currently all `SqlaTable`s 
are dual writing to SL_dataset.
   
   We currently have two types of datasources in the config, SqlaTable 
(Dataset) and the Druid Datasource. If a chart connects to something, it should 
be a datasource. It follows in line with the methodology of what we are trying 
to achieve and doesn’t add in any complicated middle layers, and will be very 
extendable. With SIP 68 we will be removing Druid NoSQL Datasource and the 
datasource as a config and instead limiting the datasources to those classes 
that have the functionality needed to power a chart.
   
   As part of SIP 68, the datasource configs are going to be removed and 
datasources will no longer be configurable. We currently are working on 
converting the `ConnectorRegistry` which uses the configs to a 
[[DatasourceDAO](https://github.com/apache/superset/pull/19811)](https://github.com/apache/superset/pull/19811).
 
   
   - **Charts by Tables:**
       - Import/export
           - since a chart cannot be saved until it has a dataset, this is n/a 
for now
       - Explore/Dashboard view
           - When selecting a table as a datasource, we would create a 
`sl_table` instance and save it to the chart as a datasource. The `sl_table` 
would have all the column information needed to power the explore view.
           - On save, we just create the dataset to point to the already 
created Table.
       - SQL Lab to explore
           - This only applies to queries
   - **Charts by Queries :**
       - Import/export
           - since a chart cannot be saved until it has a dataset, this is n/a 
for now
       - Explore/Dashboard view
           - The `Query` would have a new relationship to `Column` (i.e., 
`sl_columns`) for all the column information needed to power the explore view. 
This has the potential to create a lot of duplicate columns for `Queries`. 
Another option, since a Query is immutable, is to save the columns in a json 
blob.
           - If it saves time/effort we are evaluating the possibility of not 
having cache for `Queries`.
       - SQL Lab to explore:
           - A chart will be linked to a `Query` from this flow. This is the 
only way that someone can create a chart from a `Query`.
           - On save, we create a dataset and add the query as the `expression`
   - **Charts by SavedQueries**
       - Import/export
           - since a chart cannot be saved until it has a dataset, this is n/a 
for now
       - Explore/Dashboard view
           - When selecting a SavedQuery as a datasource, we would tie that 
object to the chart. The SavedQuery would have a new relationship to `Column` 
(i.e., `sl_columns`) for all the column information needed to power the explore 
view.
           - On save, we create a dataset and add the query as the `expression`
       - SQL Lab to explore:
           - n/a
   - **Charts by Dataset:**
       - We need to update the old `SqlaTable` to a new `Sl_dataset` as part of 
SIP68. Everything else will be the same.`
   
   ### New or Changed Public Interfaces
   
   New UI flows are described here: 
   https://github.com/apache/superset/discussions/18584
   
   ### New dependencies
   
   Describe any `npm`/`PyPI` packages that are required. Are they actively 
maintained? What are their licenses?
   
   ### Migration Plan and Compatibility
   
   We will need to add columns to Queries and SavedQueries
   
   ### Rejected Alternatives
   
   1) Create a temporary dataset without explicitly asking the user to do 
anything
   
       **Pros**: Simple for engineering, seamless, not much extra work.
   
       **Cons**: Users will see a bloated list of datasets in their dataset 
crud view and won’t know what they are.
   
       1b. Mark these datasets as hidden and don’t show them on the CRUD page. 
   
       **Pros**: Simple, easy to build. Users don’t see extra datasets.
   
       **Cons**: It gets complicated to have two different types of datasets, 
especially now that we are cleaning up the virtual vs physical. Now we would 
have hidden and visible, but we’re saying that the chart is backed by a query 
table, when in reality it’s not
   
   2) Create a dataset just during the request cycle
   
       **Pros**: Doesn’t bloat the user’s CRUD list; There aren’t two types of 
datasets that we have to deal with
   
       **Cons**: It’s also complicated to create a dataset each time and could 
slow down performance, especially if we have to query their database too often.
   
   3) Request the column data from the db each time we need that information
   
       **Pros**: We don’t need to store any extra data except on the client 
side.
   
       **Cons**: Poor performance, and could incur extra cost to the user for 
db usage.
   
   4) Make a lightweight dataset by storing just column data in redis
   
       **Pros**: We don’t need to deal with any database models and/or database
   
       **Cons**: We are adding a separate middleware to the models when we 
don’t need to. Plus we would need to write up all of the logic for 
storing/retrieving the data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to