gwierzchowski opened a new issue #15092:
URL: https://github.com/apache/superset/issues/15092
## [SIP] Proposal for exporting data from datasets as jinja keys to be used
in other charts
### Motivation
We (at Hitachi-ABB Power Grids R&D) are trying to use Superset to build
dashboards that present different measures related to energetic grids'
performance and reliability. Definitions of those measures are fixed and
regulated by respective norms. Many of them are defined as a division of one
value by other, and here we face some problem. Nominators and denominators in
our formulas are being calculated from different datasets (based on queries on
PostgreSQL).
E.g. Inside BigNumber chart (and other charts like line chart) we want to
display number `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` where
"DurationMin" is taken from other joined tables than "CustomerCode". And we
want this calculation to be properly updated when we change selections in
different filters. The problem is that we have for instance filter on data like
Extent which only applies to dataset related to nominator (DurationMin) and is
completely irrelevant to Consumer dataset (in means that there is no connection
between tables). Even worse both datasets depend on some time-stamped data but
with different granularity. So result is that when we try to apply some filters
our value presented in BigNumber is being calculated improperly or we get
division by zero.
This generally narrows down to need of having Charts that depend on two (or
>1) datasets that are differently and independently filtered.
We tried at least 2 workarounds (described below in Alternatives section)
but none worked well or was very hacky and not elegant.
### Proposed Change
We would like to make following changes in Superset in order to implement
our dashboards in elegant way.
- Add possibility for datasets to export several values (could be some SQL
expression - e.g. aggregation or so performed on dataset data after applying
filter) to dictionary accessible by jinja (or other configured template
engine). The keys would be strings, and the values could be either numeric,
string or array/list in the sense of templating engine.
- Add possibility for dashboards to add plain datasets to them (in addition
to charts) and allow user to configure which filters apply to which datasets
(as it is for charts now). Added datasets could be placed in some special area
of dashboard presented in GUI only in edition mode but hidden (working in
background) in run-mode.
- Make necessary changes to ensure that added datasets are calculated first,
and then charts when dashboard needs recalculation.
- Make values exported by datasets to be accessible by Charts - i.e. could
be used in Chart custom SQL and be processed by Jinja/Templating engine before
query is passed to DB.
- Possibly write and register some Jinja functions to reformat values as
proper SQL (or other backends) - e.g. Array as `{ val, val2, val3 }` .
- Perform some optimization in case when the the same dataset is added
directly to dashboard (to make it export values and be calculated first) and is
used by chart in the same dashboard. Optimization should make dataset to be
calculated only once.
- Possible other optimization (or maybe explicit option) would be in case
when some Chart will use only exported values - i.e. should be only processed
by Jinja. And it is in order to avoid sending literals-only queries to database.
With those changes we could have 2 datasets in the dashboard that export
`nom` and `denom` respectively and define our Chart to display: `{{nom}} /
{{denom}}` (or even make jinja if expression to divide by 1 if denominator is
zero).
We think that such changes could highly extend "calculation possibilities"
of Superset and also other users will benefit. The good thing about this is
that all this could be done without necessity to implement own parsing /
expression processing engine. We will use backend DB engine and Jinja / Python
engines with already present in Superset possibility to register custom Jinja
methods.
We have some both back-end (speaking in Python) and front-end (speaking in
Typescript and React) engineers to try implementing those changes and prepare
PR, but still need approval for work from company.
### New or Changed Public Interfaces
I see possible to points where interfaces should be extended:
- Exporting values from dataset (new dataset property)
- Adding datasets to dashboards
I think they could be done in backsword compatible way.
### New dependencies
I do not for-see any new dependencies.
### Migration Plan and Compatibility
Currently defined dashboards could assume none plain datasets added, and
existing datasets not export any values. This should assure backward
compatibility.
### Rejected Alternatives
We considered 2 workarounds:
1. Define dataset as several concatenated queries - like:
```
SELECT
'part1' as "part",
aaa,
bbb,
NULL as "ccc",
ddd
FROM tab1
UNION ALL
SELECT
'part2' as "part",
aaa,
bbb,
ccc,
NULL as "ddd"
FROM tab2
```
where `ccc` can not be derived from `tab1` by any means joins etc. and vice
versa with `ddd`.
And then inside Chart try to properly calculate something. It did not worked
well when we had filter against one of NULL-able value e.g. `ddd`. When we set
same value in filter it caused second part of query to return no records, but
for our calculations we need it to not react on that filter at all. Other cons
was that in practice the query was quite long having many NULLs at both sides
and made it hard to maintain and also less performant (as whole query is redone
when anyone filter change).
2. Inside SQL injections in our Charts write one of parts as nested select
query: e.g. instead: `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` we
tried `SUM("DurationMin")/( SELECT .... )`. This worked for some cases, but
not always and led to very poor not acceptable performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]