@mistercrunch I agree that this might seem excessive, but let me explain the 
reasoning behind the changes:

The main argument is that `viz.py` should be independent of database type. In 
order to achieve this, any column names or labels in `query_obj` need to be in 
their original format. Let's assume that we have a metric with the label 
`SUM(col)`. In the case of Redshift and BigQuery these would cause the 
following:
- **Redshift**: the resulting dataframe will contain a column called `sum(col)` 
(gets automatically mutated by the database)
- **Bigquery**: will fail unless the label is mutated.

Currently (in `master`) BigQuery solves this by replacing the parens with 
underscores, resulting in a column called `SUM_col_`. Aside from the mostly 
theoretical risk of collisions, this introduces a discrepancy between the 
dataframe and `form_data`/`query_obj`. Furthermore, the mapping from metric 
name to verbose name used by all charts doesn't work (`data.verbose_map` in 
`/connectors/base/models.py`) unless that is also made aware of the new mutated 
labels. The way I see it there are two ways around this:
1) Either the columns in the dataframe returned by `/connectors/sqla/models.py` 
need to be renamed to their original state prior to being passed to their 
respective Viz, or
2) The Viz need to be aware that some labels have changed (in this case 
`SUM_col_` actually refers to `SUM(col)`).

What this PR attempts to do is move from 2) to 1), i.e. encapsulate all SQLA 
specific logic in `/connectors/sqla/models.py`. In this proposal this has been 
done by pushing around a dict that collects all mutated labels in a dict, and 
renames the dataframe columns to their original state prior to being returned. 
I agree that this looks clumsy, but seemed like the best solution at the time. 
This can probably be refactored to something more maintainable/understandable. 
Where this approach adds complexity to SQLA models logic, this decouples Viz 
logic completely from the database backend, which I think is a good thing.

While it might appear excessively complicated, I think the heterogeneous nature 
of the SQLA ecosystem seems to require a lot of flexibility from the backend to 
be able to conform to the quirks of every individual engine. However, if this 
still feels like the wrong approach I am open to suggestions.

[ Full content available at: 
https://github.com/apache/incubator-superset/pull/5827 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to