### Motivation

Superset currently supports two engine connectors for querying datasources; 
SQLAlchemy and the Druid REST API. The later was the initial use case for 
Superset, i.e., a UI for visualizing Druid datasources.

Since version 
[0.10.0](https://github.com/apache/incubator-druid/releases/tag/druid-0.10.0) 
Druid has included a built-it SQL server which has a SQLAlchemy binding 
provided by the [pydruid](https://github.com/druid-io/pydruid) library 
(courtesy of @betodealmeida and @mistercrunch) and thus the proposed change is 
to deprecate the REST API interface in favor of having a single interface 
(SQLAlchemy) to all engines. Note all future engines (there has been mentioned 
of adding support for Elasticsearch) would require a SQLAlchemy dialect. 

There is a non-insignificant amount of overhead in supporting both connectors 
including:

#### Code

>From a code perspective each connector needs to define similar views and 
>models. The 
>[Druid](https://github.com/apache/incubator-superset/tree/master/superset/connectors/druid)
> connector alone comprises of around 2,000 lines of code. There is additional 
>frontend logic which needs to construct filters, metrics, etc. for both the 
>Druid REST API and SQLAlchemy. Note there are 
>[74](https://github.com/apache/incubator-superset/search?q=druid&unscoped_q=druid)
> files (including documentation) which reference Druid in the repo. 

#### Models

In addition to code overhead each connector defines its own models and database 
tables:

Druid: 
- `clusters`
- `datasources`
- `columns`
- `metrics`

SQLAlchemy: 
- `dbs` 
- `tables`
- `table_columns`
- `sql_metric`

which complicates logic, i.e., the `slices` table does not have a SQLAlchemy 
relationship to a "datasource" table as the datasource type determines the 
association. This results in denormalized tables with potentially incorrect 
values, i.e., the `slices` table contains the `datasource_name` column for the 
FAB CRUD views, however this may not accurately reflect the underlying 
datasource name. 

#### Proposed Change

The proposed change would be to deprecate all the Druid REST logic from the 
codebase. This significantly simplifies and streamlines a number of facets of 
Superset by ensuring that all engines connect via a SQLAlchemy dialect.

Currently there is support for syncing/refreshing Druid datasource associated 
with the REST API connector which I suspect is leveraged by a number of 
organizations. 
[SIP-7](https://github.com/apache/incubator-superset/issues/5842) discussing 
"refreshing" of Superset datasources. 

Note this would be a breaking change for any organizing using a Druid version 
less than `0.10.0`. Also there may be some instances of post-aggregate Druid 
functions which are not supported in Druid SQL.

#### New or Changed Public Interfaces

There would be no new or changed public interfaces.

#### New dependencies

There would be no new dependencies.

#### Migration Plan and Compatibility

A non-trivial database migration would be required including:
- All records in the Druid tables listed above would need to be migrated to the 
SQLAlchemy equivalent table.  
- Existing slices would need to be updated to reference the new SQLAlchemy 
representation of the Druid datasource.
- Re-normalize the `slices` table.
- Update chart data to remove the obsolete `table__` or `druid__` prefixes.

#### Rejected Alternatives

None.

to: @betodealmeida @graceguo-supercat @kristw @michellethomas @mistercrunch 
@timifasubaa 


[ Full content available at: 
https://github.com/apache/incubator-superset/issues/6032 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to