betodealmeida opened a new issue, #30816: URL: https://github.com/apache/superset/issues/30816
# [SIP] A simpler chart API for dashboards # Motivation Historically, the Apache Superset dashboard has been built on top its interactive, no-code, chart builder (”Explore”). While it makes sense to reuse the chart API endpoints for fetching the data for charts in dashboards, in order to make development faster, at the end of the day these are two different workflows: one is an interactive chart *creator*, the other is a mostly passive chart *consumer*. This distinction has left considerable technical debt behind. In order for a dashboard to load a given chart it needs to perform a `POST` request to the the chart API, submitting a request payload that includes the `query` and `form_data` needed by the endpoint. This data is redundant, since it’s already available to the backend (in fact, it needs to be fetched from the backend before the `POST`) and attached to the chart ID (which is also passed in the request). This causes the following problems: 1. Dashboards need to load the full data for all of its charts, instead of just their IDs. This can cause noticeable latency (in our case we’ve observed a latency of ~400ms for larger dashboards, and of course this could potentially be much higher). 2. For each chart, a `POST` request is done sending back the data that was just provided by the backend. Not only this is unnecessary, but it opens an attack vector where users can modify the form data in order to fetch unauthorized data — for example, in an embedded dashboard. While this should never be possible by design, there were multiple CVE reports exploiting this channel. 3. While the `POST` requests still benefit from the Superset cache (if one is configured), they won’t benefit from other caches along the way (reverse proxies, browser cache). Replacing some of these with `GET` requests will provide for better dashboard performance and fewer costs, since the caching can be done upstream, as long as response headers are correctly returned by Superset. # Proposed change Currently, in order to retrieve the data needed to render a chart with ID **74**, a dashboard with ID **8** will issue a request like this: ```bash POST https://superset/api/v1/chart/data?form_data={"slice_id":74}&dashboard_id=8 ``` The request payload is not small: ```json { "datasource": { "id": 19, "type": "table" }, "force": false, "queries": [ { "filters": [ { "col": "time_start", "op": "TEMPORAL_RANGE", "val": "No filter" } ], "extras": { "having": "", "where": "" }, "applied_time_extras": {}, "columns": [ "first_time_developer" ], "metrics": [ "count" ], "orderby": [ [ "count", false ] ], "annotation_layers": [], "row_limit": 100, "series_limit": 0, "order_desc": true, "url_params": { "native_filters_key": "UrzGasKJgA6DRj9-qiGtBuAiT0qEW516UqzMPUNLsq9fqY-eS4REkBL_MEcLnYW1" }, "custom_params": {}, "custom_form_data": {} } ], "form_data": { "datasource": "19__table", "viz_type": "pie", "slice_id": 74, "url_params": { "native_filters_key": "UrzGasKJgA6DRj9-qiGtBuAiT0qEW516UqzMPUNLsq9fqY-eS4REkBL_MEcLnYW1" }, "groupby": [ "first_time_developer" ], "metric": "count", "adhoc_filters": [ { "clause": "WHERE", "comparator": "No filter", "expressionType": "SIMPLE", "operator": "TEMPORAL_RANGE", "subject": "time_start" } ], "row_limit": 100, "sort_by_metric": true, "color_scheme": "supersetColors", "show_labels_threshold": 5, "show_legend": true, "legendType": "scroll", "legendOrientation": "top", "label_type": "key", "number_format": "SMART_NUMBER", "date_format": "smart_date", "show_labels": true, "labels_outside": true, "outerRadius": 70, "innerRadius": 30, "dashboards": [ 8 ], "extra_form_data": {}, "label_colors": {}, "shared_label_colors": { "No Answer": "#1FA8C9", "Yes": "#454E7C", "No": "#5AC189" }, "extra_filters": [], "dashboardId": 8, "force": false, "result_format": "json", "result_type": "full" }, "result_format": "json", "result_type": "full" } ``` Again: not only the payload is relatively big, but it’s also completely unnecessary, since it has to be requested from the backend before the `POST` request is made. If the dashboard has native filters in place, they will be passed as additional filters in the request payload (in bold below): ```json { "datasource": { "id": 19, "type": "table" }, "queries": [ { "filters": [ { "col": "highest_degree_earned", "op": "IN", "val": [ "A. No high school (secondary school)" ] }, { "col": "time_start", "op": "TEMPORAL_RANGE", "val": "No filter" } ], } ], ... "form_data": { "datasource": "19__table", "slice_id": 74, ... "extra_form_data": { "filters": [ { "col": "highest_degree_earned", "op": "IN", "val": [ "A. No high school (secondary school)" ] } ] } } } ``` This SIP proposes that **chart requests without any applied native filters are to be replaced with `GET` requests** to a new API endpoint: ```bash GET https://superset/api/v1/dashboard/8/chart/74/data ``` This has multiple benefits: 1. The chart payload doesn’t need to be downloaded before the request, since with the change only the IDs of the chart in a given dashboard are needed. 2. The request payload cannot be modified by malicious users. 3. If the resource is cached by the backend, the API can set the correct HTTP headers so that the browser caches the response. During the lifetime of the cache the browser won’t need to do any network requests at all. Uses are still able to force-refresh the chart, in which case a simple `POST` request without a request payload is issued: ```bash POST https://superset/api/v1/dashboard/8/chart/74/data ``` When the dashboard has native filters applied to the chart, a `POST` is necessary. In that case, **only the additional configuration is passed**. For filters: ```json { "queries": [ { "filters": [ { "col": "highest_degree_earned", "op": "IN", "val": [ "A. No high school (secondary school)" ] } ] } ] } ``` Or a custom time grain: ```json { "queries": [ { "extras": { "time_grain_sqla": "P1M", "having": "", "where": "" } } ] } ``` The backend will then be responsible for merging the request with the saved chart configuration, in order to produce and return the requested data. # **New or Changed Public Interfaces** A new API endpoint will be introduced, as described above. # **New dependencies** None. # **Migration Plan and Compatibility** The original endpoint should remain, both for compatibility with existing clients but mainly because it’s still used by the no-code chart builder (”Explore”). A new feature flag, `DASHBOARD_SIMPLE_CHART_API`, will be introduced, so that the feature can be tested initially until it’s mature enough to become the default workflow. Eventually, the feature flag should be removed and only the new proposed API should be used. # **Rejected Alternatives** This proposal has some overlap with the `CLIENT_CACHE` feature [introduced](https://github.com/apache/superset/pull/7032) by the author in Superset **0.34** and [removed](https://github.com/apache/superset/pull/26348) in **4.0.0**, though it’s conceptually simpler since it doesn’t introduce an explicit Javascript caching layer. Since the proposal is mostly concerned with removing technical debt, as opposed to implementing a new feature, there were no other alternatives considered. Doing `GET` requests for stored data and `POST`ing only as much data as needed are aligned with best practices in REST APIs. We discussed moving the dashboard filters to the URL and use only `GET` requests, but since most browsers have a limit of ~2000 characters in the URL this would break if too many filters are in place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org