### Motivation Superset currently supports two engine connectors for querying datasources; SQLAlchemy and the Druid REST API. The later was the initial use case for Superset, i.e., a UI for visualizing Druid datasources.
Since version [0.10.0](https://github.com/apache/incubator-druid/releases/tag/druid-0.10.0) Druid has included a built-it SQL server which has a SQLAlchemy binding provided by the [pydruid](https://github.com/druid-io/pydruid) library (courtesy of @betodealmeida and @mistercrunch) and thus the proposed change is to deprecate the REST API interface in favor of having a single interface (SQLAlchemy) to all engines. Note all future engines (there has been mentioned of adding support for Elasticsearch) would require a SQLAlchemy dialect. There is a non-insignificant amount of overhead in supporting both connectors including: #### Code >From a code perspective each connector needs to define similar views and >models. The >[Druid](https://github.com/apache/incubator-superset/tree/master/superset/connectors/druid) > connector alone comprises of around 2,000 lines of code. There is additional >frontend logic which needs to construct filters, metrics, etc. for both the >Druid REST API and SQLAlchemy. Note there are >[74](https://github.com/apache/incubator-superset/search?q=druid&unscoped_q=druid) > files (including documentation) which reference Druid in the repo. #### Models In addition to code overhead each connector defines its own models and database tables: Druid: - `clusters` - `datasources` - `columns` - `metrics` SQLAlchemy: - `dbs` - `tables` - `table_columns` - `sql_metric` which complicates logic, i.e., the `slices` table does not have a SQLAlchemy relationship to a "datasource" table as the datasource type determines the association. This results in denormalized tables with potentially incorrect values, i.e., the `slices` table contains the `datasource_name` column for the FAB CRUD views, however this may not accurately reflect the underlying datasource name. #### Proposed Change The proposed change would be to deprecate all the Druid REST logic from the codebase. This significantly simplifies and streamlines a number of facets of Superset by ensuring that all engines connect via a SQLAlchemy dialect. Currently there is support for syncing/refreshing Druid datasource associated with the REST API connector which I suspect is leveraged by a number of organizations. [SIP-7](https://github.com/apache/incubator-superset/issues/5842) discussing "refreshing" of Superset datasources. Note this would be a breaking change for any organizing using a Druid version less than `0.10.0`. Also there may be some instances of post-aggregate Druid functions which are not supported in Druid SQL. #### New or Changed Public Interfaces There would be no new or changed public interfaces. #### New dependencies There would be no new dependencies. #### Migration Plan and Compatibility A non-trivial database migration would be required including: - All records in the Druid tables listed above would need to be migrated to the SQLAlchemy equivalent table. - Existing slices would need to be updated to reference the new SQLAlchemy representation of the Druid datasource. - Re-normalize the `slices` table. - Update chart data to remove the obsolete `table__` or `druid__` prefixes. #### Rejected Alternatives None. to: @betodealmeida @graceguo-supercat @kristw @michellethomas @mistercrunch @timifasubaa [ Full content available at: https://github.com/apache/incubator-superset/issues/6032 ] This message was relayed via gitbox.apache.org for [email protected]
