[GitHub] [superset] eschutho opened a new issue, #19953: SIP- Chart creation without a dataset

GitBox Wed, 04 May 2022 09:35:11 -0700


eschutho opened a new issue, #19953:
URL: https://github.com/apache/superset/issues/19953

## [SIP-\<number>] Proposal for Chart creation without a dataset<title>

### Motivation

Currently a user needs to create a dataset for each chart that they want to
create. Many times these charts aren’t kept for long, usually either never
making it to a dashboard or because someone just wants a quick view of their
data to share for feedback or to gain insight into their own queries, tables,
etc. A lot of new users don’t understand what a dataset is or why they need it.
We want to allow people to progressively move into dataset usage, and allow
them to create a chart quickly based on either a query, saved query, table or
dataset. When they save we will prompt them to name a dataset, which will be a
much lower barrier to visualizing their data quickly.

### Proposed Change

Users should be able to create a chart from the chart page, from sql lab, or
from a dataset. From explore or SQL Lab, they need to be able to view a chart,
apply filters, see a list of columns in their query or table just as they do
now, but without creating a dataset. If coming from a dataset view, they should
be able to continue to use a dataset to back a chart as they can currently.

This solution is based on this new approved flow:
[https://github.com/apache/superset/discussions/18584](https://github.com/apache/superset/discussions/18584)

1st PR for chart creation with a query is here:
[https://github.com/apache/superset/pull/19812/files](https://github.com/apache/superset/pull/19812/files)

We will be [[creating a
mixin](https://github.com/apache/superset/pull/19812)](https://github.com/apache/superset/pull/19812)
as part of SIP 68 that contains all of the necessary functionality to power a
chart. By extending that mixing to other models that have the necessary
relationships (database, schema, columns) those models can also be used to
power a chart. Our plan is to start with Queries and introduce them in a dark
release. Next moving on to SavedQuery and then the new SL_table. We’ll finish
up with the new SL_dataset and then once we confirmed and tested, we can start
to point all of the charts over to the SL_dataset. Currently all `SqlaTable`s
are dual writing to SL_dataset.

We currently have two types of datasources in the config, SqlaTable
(Dataset) and the Druid Datasource. If a chart connects to something, it should
be a datasource. It follows in line with the methodology of what we are trying
to achieve and doesn’t add in any complicated middle layers, and will be very
extendable. With SIP 68 we will be removing Druid NoSQL Datasource and the
datasource as a config and instead limiting the datasources to those classes
that have the functionality needed to power a chart.

As part of SIP 68, the datasource configs are going to be removed and
datasources will no longer be configurable. We currently are working on
converting the `ConnectorRegistry` which uses the configs to a
[[DatasourceDAO](https://github.com/apache/superset/pull/19811)](https://github.com/apache/superset/pull/19811).

- **Charts by Tables:**
- Import/export
- since a chart cannot be saved until it has a dataset, this is n/a
for now
- Explore/Dashboard view
- When selecting a table as a datasource, we would create a
`sl_table` instance and save it to the chart as a datasource. The `sl_table`
would have all the column information needed to power the explore view.
- On save, we just create the dataset to point to the already
created Table.
- SQL Lab to explore
- This only applies to queries
- **Charts by Queries :**
- Import/export
- since a chart cannot be saved until it has a dataset, this is n/a
for now
- Explore/Dashboard view
- The `Query` would have a new relationship to `Column` (i.e.,
`sl_columns`) for all the column information needed to power the explore view.
This has the potential to create a lot of duplicate columns for `Queries`.
Another option, since a Query is immutable, is to save the columns in a json
blob.
- If it saves time/effort we are evaluating the possibility of not
having cache for `Queries`.
- SQL Lab to explore:
- A chart will be linked to a `Query` from this flow. This is the
only way that someone can create a chart from a `Query`.
- On save, we create a dataset and add the query as the `expression`
- **Charts by SavedQueries**
- Import/export
- since a chart cannot be saved until it has a dataset, this is n/a
for now
- Explore/Dashboard view
- When selecting a SavedQuery as a datasource, we would tie that
object to the chart. The SavedQuery would have a new relationship to `Column`
(i.e., `sl_columns`) for all the column information needed to power the explore
view.
- On save, we create a dataset and add the query as the `expression`
- SQL Lab to explore:
- n/a
- **Charts by Dataset:**
- We need to update the old `SqlaTable` to a new `Sl_dataset` as part of
SIP68. Everything else will be the same.`

### New or Changed Public Interfaces

New UI flows are described here:
https://github.com/apache/superset/discussions/18584

### New dependencies

Describe any `npm`/`PyPI` packages that are required. Are they actively
maintained? What are their licenses?

### Migration Plan and Compatibility

We will need to add columns to Queries and SavedQueries

### Rejected Alternatives

1) Create a temporary dataset without explicitly asking the user to do
anything

**Pros**: Simple for engineering, seamless, not much extra work.

**Cons**: Users will see a bloated list of datasets in their dataset
crud view and won’t know what they are.

1b. Mark these datasets as hidden and don’t show them on the CRUD page.

**Pros**: Simple, easy to build. Users don’t see extra datasets.

**Cons**: It gets complicated to have two different types of datasets,
especially now that we are cleaning up the virtual vs physical. Now we would
have hidden and visible, but we’re saying that the chart is backed by a query
table, when in reality it’s not

2) Create a dataset just during the request cycle

**Pros**: Doesn’t bloat the user’s CRUD list; There aren’t two types of
datasets that we have to deal with

**Cons**: It’s also complicated to create a dataset each time and could
slow down performance, especially if we have to query their database too often.

3) Request the column data from the db each time we need that information

**Pros**: We don’t need to store any extra data except on the client
side.

**Cons**: Poor performance, and could incur extra cost to the user for
db usage.

4) Make a lightweight dataset by storing just column data in redis

**Pros**: We don’t need to deal with any database models and/or database

**Cons**: We are adding a separate middleware to the models when we
don’t need to. Plus we would need to write up all of the logic for
storing/retrieving the data.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] eschutho opened a new issue, #19953: SIP- Chart creation without a dataset

Reply via email to