betodealmeida opened a new issue, #30816:
URL: https://github.com/apache/superset/issues/30816

   # [SIP] A simpler chart API for dashboards
   
   # Motivation
   
   Historically, the Apache Superset dashboard has been built on top its 
interactive, no-code, chart builder (”Explore”). While it makes sense to reuse 
the chart API endpoints for fetching the data for charts in dashboards, in 
order to make development faster, at the end of the day these are two different 
workflows: one is an interactive chart *creator*, the other is a mostly passive 
chart *consumer*.
   
   This distinction has left considerable technical debt behind. In order for a 
dashboard to load a given chart it needs to perform a `POST` request to the the 
chart API, submitting a request payload that includes the `query` and 
`form_data` needed by the endpoint. This data is redundant, since  it’s already 
available to the backend (in fact, it needs to be fetched from the backend 
before the `POST`) and attached to the chart ID (which is also passed in the 
request). This causes the following problems:
   
   1. Dashboards need to load the full data for all of its charts, instead of 
just their IDs. This can cause noticeable latency (in our case we’ve observed a 
latency of ~400ms for larger dashboards, and of course this could potentially 
be much higher).
   2. For each chart, a `POST` request is done sending back the data that was 
just provided by the backend. Not only this is unnecessary, but it opens an 
attack vector where users can modify the form data in order to fetch 
unauthorized data — for example, in an embedded dashboard. While this should 
never be possible by design, there were multiple CVE reports exploiting this 
channel.
   3. While the `POST` requests still benefit from the Superset cache (if one 
is configured), they won’t benefit from other caches along the way (reverse 
proxies, browser cache). Replacing some of these with `GET` requests will 
provide for better dashboard performance and fewer costs, since the caching can 
be done upstream, as long as response headers are correctly returned by 
Superset.
   
   # Proposed change
   
   Currently, in order to retrieve the data needed to render a chart with ID 
**74**, a dashboard with ID **8** will issue a request like this:
   
   ```bash
   POST 
https://superset/api/v1/chart/data?form_data={"slice_id":74}&dashboard_id=8
   ```
   
   The request payload is not small:
   
   ```json
   {
     "datasource": {
       "id": 19,
       "type": "table"
     },
     "force": false,
     "queries": [
       {
         "filters": [
           {
             "col": "time_start",
             "op": "TEMPORAL_RANGE",
             "val": "No filter"
           }
         ],
         "extras": {
           "having": "",
           "where": ""
         },
         "applied_time_extras": {},
         "columns": [
           "first_time_developer"
         ],
         "metrics": [
           "count"
         ],
         "orderby": [
           [
             "count",
             false
           ]
         ],
         "annotation_layers": [],
         "row_limit": 100,
         "series_limit": 0,
         "order_desc": true,
         "url_params": {
           "native_filters_key": 
"UrzGasKJgA6DRj9-qiGtBuAiT0qEW516UqzMPUNLsq9fqY-eS4REkBL_MEcLnYW1"
         },
         "custom_params": {},
         "custom_form_data": {}
       }
     ],
     "form_data": {
       "datasource": "19__table",
       "viz_type": "pie",
       "slice_id": 74,
       "url_params": {
         "native_filters_key": 
"UrzGasKJgA6DRj9-qiGtBuAiT0qEW516UqzMPUNLsq9fqY-eS4REkBL_MEcLnYW1"
       },
       "groupby": [
         "first_time_developer"
       ],
       "metric": "count",
       "adhoc_filters": [
         {
           "clause": "WHERE",
           "comparator": "No filter",
           "expressionType": "SIMPLE",
           "operator": "TEMPORAL_RANGE",
           "subject": "time_start"
         }
       ],
       "row_limit": 100,
       "sort_by_metric": true,
       "color_scheme": "supersetColors",
       "show_labels_threshold": 5,
       "show_legend": true,
       "legendType": "scroll",
       "legendOrientation": "top",
       "label_type": "key",
       "number_format": "SMART_NUMBER",
       "date_format": "smart_date",
       "show_labels": true,
       "labels_outside": true,
       "outerRadius": 70,
       "innerRadius": 30,
       "dashboards": [
         8
       ],
       "extra_form_data": {},
       "label_colors": {},
       "shared_label_colors": {
         "No Answer": "#1FA8C9",
         "Yes": "#454E7C",
         "No": "#5AC189"
       },
       "extra_filters": [],
       "dashboardId": 8,
       "force": false,
       "result_format": "json",
       "result_type": "full"
     },
     "result_format": "json",
     "result_type": "full"
   }
   ```
   
   Again: not only the payload is relatively big, but it’s also completely 
unnecessary, since it has to be requested from the backend before the `POST` 
request is made.
   
   If the dashboard has native filters in place, they will be passed as 
additional filters in the request payload (in bold below):
   
   ```json
   {
     "datasource": {
       "id": 19,
       "type": "table"
     },
     "queries": [
       {
         "filters": [
           {
             "col": "highest_degree_earned",
             "op": "IN",
             "val": [
               "A. No high school (secondary school)"
             ]
           },
           {
             "col": "time_start",
             "op": "TEMPORAL_RANGE",
             "val": "No filter"
           }
         ],
       }
     ],
     ...
     "form_data": {
       "datasource": "19__table",
       "slice_id": 74,
       ...
       "extra_form_data": {
         "filters": [
           {
             "col": "highest_degree_earned",
             "op": "IN",
             "val": [
               "A. No high school (secondary school)"
             ]
           }
         ]
       }
     }
   }
   
   ```
   
   This SIP proposes that **chart requests without any applied native filters 
are to be replaced with `GET` requests** to a new API endpoint:
   
   ```bash
   GET https://superset/api/v1/dashboard/8/chart/74/data
   ```
   
   This has multiple benefits:
   
   1. The chart payload doesn’t need to be downloaded before the request, since 
with the change only the IDs of the chart in a given dashboard are needed.
   2. The request payload cannot be modified by malicious users.
   3. If the resource is cached by the backend, the API can set the correct 
HTTP headers so that the browser caches the response. During the lifetime of 
the cache the browser won’t need to do any network requests at all.
   
   Uses are still able to force-refresh the chart, in which case a simple 
`POST` request without a request payload is issued:
   
   ```bash
   POST https://superset/api/v1/dashboard/8/chart/74/data
   ```
   
   When the dashboard has native filters applied to the chart, a `POST` is 
necessary. In that case, **only the additional configuration is passed**. For 
filters:
   
   ```json
   {
     "queries": [
       {
         "filters": [
           {
             "col": "highest_degree_earned",
             "op": "IN",
             "val": [
               "A. No high school (secondary school)"
             ]
           }
         ]
       }
     ]
   }
   ```
   
   Or a custom time grain:
   
   ```json
   {
     "queries": [
       {
         "extras": {
           "time_grain_sqla": "P1M",
           "having": "",
           "where": ""
         }
       }
     ]
   }
   ```
   
   The backend will then be responsible for merging the request with the saved 
chart configuration, in order to produce and return the requested data.
   
   # **New or Changed Public Interfaces**
   
   A new API endpoint will be introduced, as described above.
   
   # **New dependencies**
   
   None.
   
   # **Migration Plan and Compatibility**
   
   The original endpoint should remain, both for compatibility with existing 
clients but mainly because it’s still used by the no-code chart builder 
(”Explore”). A new feature flag, `DASHBOARD_SIMPLE_CHART_API`, will be 
introduced, so that the feature can be tested initially until it’s mature 
enough to become the default workflow. Eventually, the feature flag should be 
removed and only the new proposed API should be used.
   
   # **Rejected Alternatives**
   
   This proposal has some overlap with the `CLIENT_CACHE` feature 
[introduced](https://github.com/apache/superset/pull/7032) by the author in 
Superset **0.34** and [removed](https://github.com/apache/superset/pull/26348) 
in **4.0.0**, though it’s conceptually simpler since it doesn’t introduce an 
explicit Javascript caching layer.
   
   Since the proposal is mostly concerned with removing technical debt, as 
opposed to implementing a new feature, there were no other alternatives 
considered. Doing `GET` requests for stored data and `POST`ing only as much 
data as needed are aligned with best practices in REST APIs.
   
   We discussed moving the dashboard filters to the URL and use only `GET` 
requests, but since most browsers have a limit of ~2000 characters in the URL 
this would break if too many filters are in place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to