ktmud opened a new issue #9887:
URL: https://github.com/apache/incubator-superset/issues/9887


   *Disclaimer: regardless of the details embedded, this document is a 
conceptual proposal intended to start conversations. We are not committed to 
implement all or any of the proposed changes yet.*
   
   # [SIP] Proposal for Unified Chart Controls
   
   ### Motivation
   
   After obtaining tabular data from datasources, visualizations need to know 
how to map data columns to marks and channels in visualizations. Sometimes the 
visualization may also require or the users would like to perform additional 
transformations before plotting (compute moving average, pivot, aggregate, 
transpose, sample, window, etc.), which are not possible/convenient to do at 
the datasource/SQL query level. 
   
   The concept of post-processing is not new to Superset. Historically, we had 
some flavors of these known as *advanced analytics* in a few charts such as the 
line chart, but they lack clarity and consistency.
   
   __Consistency:__ Most of these transformations are implemented in 
[viz.py](https://github.com/apache/incubator-superset/blob/5ab5457522a141139958a52c88e021c3e5a50ad7/superset/viz.py),
 with each visualization having its own non-standardized post-processor. This 
Python module has grown out of control and become difficult to maintain. The 
coupling between visualization and query response makes it difficult to improve 
some important features such as chart slice caching, embeddable charts and 
visualization plugins.
   
   We have been slowly migrating to a new visualization-independent query API 
([SIP-5](https://github.com/apache/incubator-superset/issues/5680), 
[SIP-6](https://github.com/apache/incubator-superset/issues/5692), 
[#6220](https://github.com/apache/incubator-superset/pull/6220)). The idea is 
to decouple data querying with visualizations, move most visualization-specific 
transformations to the frontend. The new API will always return tabular data. 
It is up to each visualization plugin to take the output, optionally run it 
through a generalized post-processing API on the server side before 
([#9427](https://github.com/apache/incubator-superset/pull/9427)), and pass the 
tabular data to the visualization code, which handle the rest of the 
visualization-specific transformation on the client-side.
   
   __Clarity:__ Some post-processing can be implicit but straightforward, e.g., 
when users want to add moving averages to a line chart, we simply introduce a 
form control that computes a derived column. It is easy to infer how that 
column should be presented in the visualization. However, while there is room 
for abstraction, this approach requires developers to create custom controls 
for every visualization, each with their own `transformProps` logics. Like many 
basic chart controls in Superset, these controls affect both data manipulation 
and presentation, so they still unavoidably bind data querying to 
visualizations. It may be straightforward to use them in one simple chart, but 
it quickly becomes confusing when there are a lot of different controls across 
many visualizations---both developers and users sometimes have to guess what’s 
happening under the hood.
   
   __Flexibility:__ In addition to lack of clarity and consistency, the custom 
control approach also fall short of supporting more powerful visualizations. 
For example, following table chart with mock data is common in top-line 
business reports. It compactly displays multiple metrics (bookings and revenue) 
across multiple dimensions (state, user type) and multiple time periods 
(point-in-time measurement and 7 day moving average):
   
   
![Snip20200521_5](https://user-images.githubusercontent.com/335541/82700410-fc10c500-9c22-11ea-9534-c4aca5777f58.png)
   
   Suppose each metric and dimension is a column in the database, and each row 
is their values at a given date. Currently, in order to create this output 
table, users have to write very complex SQL queries and use a virtual 
datasource. But it’s actually possible to very quickly build the same chart 
using a combination of Pandas post-processing operators, without writing 
complex database queries.
   
   ### Proposed Change
   
   To solve the challenges above, we propose to (1) add a __Transform__ section 
for server-side post-processing and (2) rearrange the __Customize__ controls in 
the control panel.
   
   <img width="400" 
src="https://user-images.githubusercontent.com/335541/82695411-e3e87800-9c19-11ea-8a4e-35fe1582e815.png";
 />
   
   #### The Transform Section
   
   In the Transform section, users can specify stackable atomic transform 
operators mapped directly to the 
[pandas_postprocessing](https://github.com/apache/incubator-superset/blob/a52cfcd234aa7d506c5d0ee659492e772940dbba/superset/utils/pandas_postprocessing.py)
 API already implemented in the backend. For example, the screenshot below 
shows the controls popup for *Rolling Window* transformation:
   
   
![Snip20200520_9](https://user-images.githubusercontent.com/335541/82698525-c3232100-9c1f-11ea-9142-bd4838196211.png)
   
   This one-on-one mapping between transform controls and post processing 
operators makes it easy for documentation. We can just point users to the 
Python API spec for 
[pandas_postprocessing](https://github.com/apache/incubator-superset/blob/a52cfcd234aa7d506c5d0ee659492e772940dbba/superset/utils/pandas_postprocessing.py).
   
   After applying a transformation, users should be able to view the 
intermediate and final tabular data (*the final results returned by the server 
will always be tabular*). We can add a button to switch between data view and 
chart view:
   
   
![Snip20200520_12](https://user-images.githubusercontent.com/335541/82695344-c0253200-9c19-11ea-848c-db315a4ce9d5.png)
   
   Or implement [the 
split-view](https://user-images.githubusercontent.com/812905/72408878-05d86800-3719-11ea-947d-8fe28fd6802b.png)
 proposed in SIP-34.
   
   Clicking on the accordion list items in Transform will switch between the 
intermediate transformed results.
   
   #### The Customize Section
   
   Superset introduced the *Data* vs *Customize* tab to reduce clutter in the 
control panel. This is helpful for charts with many options. But it also makes 
the Customize options difficult to discover. A lot of users don’t even know 
they can add pagination to the table chart.
   
   To simplify the user experience, we intend to move the Customize tab to a 
new Customize section under the main tab and add a __Columns__ section before 
other chart rendering options.
   
   The Columns section configures per-column meta data such as d3 format, 
tooltip template, suffix/prefix, and conditional formatting. Not all 
visualizations have customizable rendering options, but all of them will have 
the *Columns* section storing meta data corresponding to the final tabular data 
output.  We believe with refined UI hierarchy, it’s possible to resurface the 
customize controls without creating clutters.
   
   The full mockup can be found 
[here](https://www.figma.com/proto/O28oGbqtjVFTYrAxgyLrBJ/Superset-Chart-Explore-Redesign?node-id=24%3A22&scaling=min-zoom).
   
   #### Long-term Plan
   
   In the future, all visualization controls will follow three simple steps: 
Query (datasource queries) → Transform (server side post processing) → 
Customize (chart rendering). This is akin to the visualization grammar used by 
[Vega](https://vega.github.io/vega/) and 
[Vega-lite](https://vega.github.io/vega-lite/). This separation of concerns and 
unification of control semantics make it super clear what each control is 
responsible for.
   
   By moving column mapping and chart rendering logics to *Customize*, we can 
also remove control overrides in the *Query* section---currently there are too 
many variants of `metrics`, `columns`, and `groupby`  fields querying the same 
thing but are stored differently 
([apache-superset/superset-ui#485](https://github.com/apache-superset/superset-ui/pull/485)
 provided a mask to help developers; end users may still be confused). This not 
only greatly simplifies the code, but also makes it easier to switch between 
visualization types---all control values for *Query*, *Transform*, and even* 
Columns* can be easily retained.
   
   In the future (beyond the scope of this SIP), if it’s too tiresome to edit 
all the transformations one by one, there are many ways to simply the Transform 
section for users:
   
   1. We can hide complex operations behind custom operators that are either a 
preset of other operators, or an arbitrary Python function running in sandbox.
   2. We can add a switch of Simple v.s. Advanced mode. The Advanced mode is 
what described above. In the Simple mode, users specify transformations using 
less controls, similar to *Advanced Analytics*. Each control could potentially 
represent one or multiple transformations with reasonable defaults.
   
   ### New or Changed Public Interfaces
   
   - New React components for the Transform controls and popup modals, which 
will interact with the `post_processing`  field in the new API 
(`/api/v1/chart/data`).
   - New shared controls for the Columns section.
   - New design pattern for control panel to optimize the hierarchy of existing 
controls (data source, time, etc.).
   - Refactor control panel config registrations and chart control overrides.
   
   ### New dependencies
   
   No new NPM/Pip dependencies needed.
   
   ### Migration Plan and Compatibility
   
   This change has profound implications on the way we think of charting in 
Superset. In addition to exposing post-processing API, it also proposes a 
change to the organization of Query controls. To reach the ultimate 
unification, we must take a step-by-step approach. For each visualization type, 
we have to do the following:
   
   - Step 1: migrate to the new chart data API
   - Step 2: add Transform controls, implement data v.s. chart split view
   - Step 3: move Custom controls to the main tab
   - Step 4: rename and simplify Query controls, some db migration may be needed
   
   For Step 2, we can start with adding the stackable advanced transform 
controls, then creating the simple mode while migrating each chart.
   
   ### Rejected Alternatives
   
   * __Add yet another custom control or visualization type for the advanced 
table chart:__ the use case is too specific. There are too many operations in 
the underlying transformation, it becomes too opaque to the users.
   * __Only add the transform section to charts which need it:__ this does not 
solve the long-standing problem of inconsistent and speculative chart controls, 
but still introduces a new pattern for charting.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to