[GitHub] [superset] gwierzchowski opened a new issue #15092: [SIP] Proposal for exporting data from datasets as jinja keys to be used in other charts

GitBox Thu, 10 Jun 2021 09:11:10 -0700


gwierzchowski opened a new issue #15092:
URL: https://github.com/apache/superset/issues/15092



   ## [SIP] Proposal for exporting data from datasets as jinja keys to be used 
in other charts
   
   ### Motivation
   
   We (at Hitachi-ABB Power Grids R&D) are trying to use Superset to build 
dashboards that present different measures related to energetic grids' 
performance and reliability. Definitions of those measures are fixed and 
regulated by respective norms. Many of them are defined as a division of one 
value by other, and here we face some problem. Nominators and denominators in 
our formulas are being calculated from different datasets (based on queries on 
PostgreSQL). 
   
   E.g. Inside BigNumber chart (and other charts like line chart) we want to 
display number `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` where 
"DurationMin" is taken from other joined tables than "CustomerCode". And we 
want this calculation to be properly updated when we change selections in 
different filters. The problem is that we have for instance filter on data like 
Extent which only applies to dataset related to nominator (DurationMin) and is 
completely irrelevant to Consumer dataset (in means that there is no connection 
between tables). Even worse both datasets depend on some time-stamped data but 
with different granularity. So result is that when we try to apply some filters 
our value presented in BigNumber is being calculated improperly or we get 
division by zero. 
   
   This generally narrows down to need of having Charts that depend on two (or 
>1) datasets that are differently and independently filtered.
   
   We tried at least 2 workarounds (described below in Alternatives section) 
but none worked well or was very hacky and not elegant.
   
   ### Proposed Change
   
   We would like to make following changes in Superset in order to implement 
our dashboards in elegant way.
   
   - Add possibility for datasets to export several values (could be some SQL 
expression - e.g. aggregation or so performed on dataset data after applying 
filter) to dictionary accessible by jinja (or other configured template 
engine). The keys would be strings, and the values could be either numeric, 
string or array/list in the sense of templating engine.
   - Add possibility for dashboards to add plain datasets to them (in addition 
to charts) and allow user to configure which filters apply to which datasets 
(as it is for charts now). Added datasets could be placed in some special area 
of dashboard presented in GUI only in edition mode but hidden (working in 
background) in run-mode.
   - Make necessary changes to ensure that added datasets are calculated first, 
and then charts when dashboard needs recalculation.
   - Make values exported by datasets to be accessible by Charts - i.e. could 
be used in Chart custom SQL and be processed by Jinja/Templating engine before 
query is passed to DB.
   - Possibly write and register some Jinja functions to reformat values as 
proper SQL (or other backends) - e.g. Array as `{ val, val2, val3 }` .
   - Perform some optimization in case when the the same dataset is added 
directly to dashboard (to make it export values and be calculated first) and is 
used by chart in the same dashboard. Optimization should make dataset to be 
calculated only once.
   - Possible other optimization (or maybe explicit option) would be in case 
when some Chart will use only exported values - i.e. should be only processed 
by Jinja. And it is in order to avoid sending literals-only queries to database.
   
   With those changes we could have 2 datasets in the dashboard that export 
`nom` and `denom` respectively and define our Chart to display: `{{nom}} / 
{{denom}}` (or even make jinja if expression to divide by 1 if denominator is 
zero).
   
   We think that such changes could highly extend "calculation possibilities" 
of Superset and also other users will benefit. The good thing about this is 
that all this could be done without necessity to implement own parsing / 
expression processing engine. We will use backend DB engine and Jinja / Python 
engines with already present in Superset possibility to register custom Jinja 
methods.
   
   We have some both back-end (speaking in Python) and front-end (speaking in 
Typescript and React) engineers to try implementing those changes and prepare 
PR, but still need approval for work from company.
   
   ### New or Changed Public Interfaces
   
   I see possible to points where interfaces should be extended:
   
   - Exporting values from dataset (new dataset property)
   - Adding datasets to dashboards
   
   I think they could be done in backsword compatible way.
   
   ### New dependencies
   
   I do not for-see any new dependencies.
   
   ### Migration Plan and Compatibility
   
   Currently defined dashboards could assume none plain datasets added, and 
existing datasets not export any values. This should assure backward 
compatibility. 
   
   ### Rejected Alternatives
   
   We considered 2 workarounds:
   
   1. Define dataset as several concatenated queries - like:
   ```
   SELECT
      'part1' as "part",
      aaa,
      bbb,
      NULL as "ccc",
      ddd
   FROM tab1
   UNION ALL
   SELECT
      'part2' as "part",
      aaa,
      bbb,
      ccc,
      NULL as "ddd"
   FROM tab2
   ```
   where `ccc` can not be derived from `tab1` by any means joins etc. and vice 
versa with `ddd`.
   And then inside Chart try to properly calculate something. It did not worked 
well when we had filter against one of NULL-able value  e.g. `ddd`. When we set 
same value in filter it caused second part of query to return no records, but 
for our calculations we need it to not react on that filter at all. Other cons 
was that in practice the query was quite long having many NULLs at both sides 
and made it hard to maintain and also less performant (as whole query is redone 
when anyone filter change).
   
   2. Inside SQL injections in our Charts write one of parts as nested select 
query: e.g. instead: `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` we 
tried `SUM("DurationMin")/( SELECT  .... )`. This worked for some cases, but 
not always and led to very poor not acceptable performance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] gwierzchowski opened a new issue #15092: [SIP] Proposal for exporting data from datasets as jinja keys to be used in other charts

Reply via email to