rusackas opened a new issue #9187: [SIP] Visualization plugin refactoring
URL: https://github.com/apache/incubator-superset/issues/9187
 
 
   ## [SIP] Visualization plugin refactoring
   
   
   
   ### Motivation
   
   One of the most commonly reoccurring questions in the Superset community, on 
Slack and elsewhere, is that of how to add a new data visualization. The 
answer, in short, has been “it’s hard.” While that may be true, the goal of 
this SIP is to lay out both tactical refactor needs for the current 
implementation to mature, as well as proposing a handful of roadmap features to 
make plugin development significantly easier. These changes will make upcoming 
modifications of existing plugins (see SIP-34) drastically simpler, and steer 
toward opening an ecosystem of Superset visualization plugins.
   
   Much planning and work has already been done to address the difficulty of 
adding/editing plugins, including a new query API endpoint, but there are many 
blocking issues and code migrations remaining to complete this process. Special 
thanks to @kristw, @williaster, @xtinec, and @conglei for their significant 
contributions to the frontend and API work thus far. These issues, and proposed 
solutions for them, are enumerated below. Additional suggestions are welcome.
   
   ### Proposed Changes
   
   ### General Goals:
   
   - As much code and configuration as possible for individual visualization 
plugins should be moved out of `incubator-superset` and into the individual 
plugin’s repos (in a perfect world, a new plugin wouldn’t require touching two 
repos and opening two PRs).
   - Reduce frustration in working on plugin repos, allowing people to more 
easily see changes as they make them
   
   
   <table>
   <tbody>
   
   <tr>
   <td valign="top">
   
   **Issue:**
   Control panel configurations for visualizations are centralized in a 
difficult to maintain 
[controls.jsx](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/explore/controls.jsx)
 file. All controls are located in `incubator-superset` , necessitating writing 
code in two PRs for two repos.
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   Control configurations (particularly the ones that are unique to any given 
plugin) should be migrated into the correct individual [control panel config 
files](https://github.com/apache/incubator-superset/tree/master/superset-frontend/src/explore/controlPanels)
 . An example of this can be found in [This 
PR](notion://www.notion.so/preset/%5B%3Chttps://github.com/apache/incubator-superset/pull/8222/files%3E%5D).
 These indivitual configs should then be migrated to the individual plugins, 
and references removed from `setupPlugins.ts`.
   
   </td>
   </tr>
   
   
   
   <tr>
   <td valign="top">
   
   **Issue:**
   Control panel configurations for visualizations are centralized in a 
difficult to maintain 
[controls.jsx](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/explore/controls.jsx)
 file. All controls are located in `incubator-superset` , necessitating writing 
code in two PRs for two repos.
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   Control configurations (particularly the ones that are unique to any given 
plugin) should be migrated into the correct individual [control panel config 
files](https://github.com/apache/incubator-superset/tree/master/superset-frontend/src/explore/controlPanels)
 . An example of this can be found in [This 
PR](notion://www.notion.so/preset/%5B%3Chttps://github.com/apache/incubator-superset/pull/8222/files%3E%5D).
 These indivitual configs should then be migrated to the individual plugins, 
and references removed from `setupPlugins.ts`
   
   </td>
   </tr>
   
   <tr>
   <td valign="top">
   
   **Issue:**
   Plugins (particularly when using the legacy api (`/explore_json`) require an 
entry in 
[viz.py](https://github.com/apache/incubator-superset/blob/master/superset/viz.py).
  In addition to requiring code changes to two repos, the logic in 
[viz.py](https://github.com/apache/incubator-superset/blob/master/superset/viz.py)
 has proven to be fragile and cumbersome to maintain.
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   Use of `viz.py` should be deprecated in favor of the viz-agnostic 
`api/v1/query` endpoint. In an effort to decouple this, `viz.py` logic (data 
transformations) should be broken out into individual modules and/or reusable 
methods, which should be invoked by the new endpoint. This will additionally 
require that controls should be consolidated wherever possible, e.g. use a 
single control for `metric`, `metrics`, `metric_2`, `secondary_metric`, etc.
   
   </td>
   </tr>
   
   
   <tr>
   <td valign="top">
   
   **Issue:**
   New and existing plugins cannot yet fully utilize the new 
`api/v1/query`endpoint due to the following issues:
   - Superset does not yet respect a plugin’s `useLegacy` flag to call the 
correct endpoint when required
   - The API has no means to accept data transformation options needed for 
post-processing (e.g. Pandas) to reach feature parity with the legacy API.
   - The API does not have unit tests
   - The API is not documented
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   - Modify 
[exploreUtils.js](https://github.com/conglei/incubator-superset/blob/5ea282e45fc5d7a8f7fbe93641d6caf2825abd25/superset/assets/src/explore/exploreUtils.js)
 such that the `getURIDirectory` method calls the right endpoint depending on 
the `useLegacy` flag
   - Add configuration options to the API call to invoke backend 
post-processing operations, returning transformed data
   - Write unit tests and documentation
   </td>
   </tr>
   
   
   
   <tr>
   <td valign="top">
   
   **Issue:**
   Each plugin must be registered manually in `incubator-superset`’s 
`MainPreset.js` file. Additionally, customizing the plugins loaded for a 
deployment (i.e. disabling some) is done via 
[setupPluginsExtra.js](https://github.com/apache/incubator-superset/blob/master/superset-frontend/src/setup/setupPluginsExtra.js),
 meaning the plugins are still loaded as dependencies. And this method only 
supports plugins removal, but does not let you add new plugins that are not 
listed in `package.json` from `master`.
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   Attempting to load plugins via [ES2020’s dynamic 
imports](https://github.com/tc39/proposal-dynamic-import). The exact 
implementation of this is a bit TBD, but the idea would be to move the 
responsibility for registering/loading plugins away from `MainPreset.js`. 
Instead, the plugin paths/packages (and their associated `keys`) could be 
bootstrapped as an overridable configuration file, and Superset could lazy-load 
the plugins accordingly. (note: dynamic imports are not [supported 
natively](https://caniuse.com/#feat=es6-module-dynamic-import) by IE, but Babel 
provides [potential 
recourse](https://babeljs.io/docs/en/babel-plugin-syntax-dynamic-import) for 
that).
   
   </td>
   </tr>
   
   <tr>
   <td valign="top">
   
   **Issue:**
   Development work on plugins requires manually running a `npm link` operation 
to load the local plugin, and thus see updates/edits in Superset - this is 
troublesome in that it is both fragile, and difficult for many developers to 
discover, as it’s not a common pattern).
   
   </td>
   <td  valign="top">
   
   **Proposal:**
   Automate the process! Create a “plugin dev mode” NPM script that 
automatically links (or unlinks) viz plugin packages. See a working example of 
this concept in [this 
PR](https://github.com/apache/incubator-superset/pull/8638). This would involve 
refactoring the `NVD3` plugins to not rely on `/lib` path, `preset-chart-xy` to 
not reply on `/esm` path - all plugins should follow the same build and source 
directory pattern.
   
   </td>
   </tr>
   
   </tbody>
   </table>
   
   ### Additional (follow-up) refactoring tasks
   
   - Follow CSS-in-JS patterns (see SIP-37) in viz components, sharing common 
theme styles/variables with `incubator-superset`. Theme variables may need to 
be moved to `superset-ui` to be consumed by both `superset-ui-plugins` and 
`incubator-superset`.
   - Audit and address issues with, and completeness of, i18n of plugin text.
   - Converting all viz components to TypeScript (see SIP-36)
   
   ### New or Changed Public Interfaces
   
   The query endpoint at `/api/v1/query` needs significant enhancement, as laid 
out in the proposals above (post-processing options, tests, docs).
   
   ### New dependencies
   
   N/A
   
   ### Migration Plan and Compatibility
   
   N/A
   
   ### Rejected Alternatives
   
   - **Reintroducing viz plugins into incubator-superset**
   Having the plugins be in their own repos is troublesome from a workflow 
perspective (due to the multiple PRs required, NPM Link work needed, and 
separate build processes required). The proposals laid out above seek to 
minimize this difficulty. While it is certainly possible (and indeed likely 
easier) to move the plugins back into Superset itself (like 
[Redash](https://github.com/getredash/redash/tree/master/client/app/visualizations)
 and 
[Metabase](https://github.com/metabase/metabase/tree/master/frontend/src/metabase/visualizations/visualizations)
 do), solving these more difficult problems seems more likely to open the door 
to a true plugin ecosystem for Superset.
   - **Moving data transformations to plugins (JS), deprecating Pandas**
   The idea has been floated that perhaps data transformation (at least in some 
cases) might be more the responsibility of the viz plugin itself than the 
backend, and maybe if we moved that logic, we could deprecate Pandas. To test 
the theory, some basic benchmarking attempts were made on large rollup and 
pivot tasks, to compare the performance of Pandas against 
[Zebras](https://github.com/nickslevine/zebras), 
[Datalib](https://github.com/vega/datalib), [Ramda](https://ramdajs.com/), and 
[Lodash](https://lodash.com/). This approach, at least as a global migration, 
was decided against for these reasons:
       - Sending an entire dataset over the wire, if the frontend just needs a 
rollup, is a waste of resources
       - If post-processing is done on the backend, the result can be cached 
for use by multiple charts (or multiple clients and reloads)
       - Neither Zebras nor Datalib provides an out-of-the-box pivot function 
on par with Pandas, and the `pivotWith` "recipe" from the Ramda cookbook looked 
to be significantly slower than Pandas (approx 10x).
       - All these libraries provide grouping, sorting, map/reduce 
functionality, so you can pivot the data manually. But then, so does 
[Lodash](https://lodash.com/), which matched (or slightly beat) the other JS 
libraries' performance. This was still about 2x slower than Pandas.
       - **TL;DR**: If you want to avoid writing Python for a new viz or 
calling it through the new API, and want to do a little data munging on the 
frontend, just use lodash or vanilla JS for best results.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to