[GitHub] villebro commented on issue #5827: [WIP] Improve column/alias handling for case insensitive engines

GitBox Tue, 11 Sep 2018 23:22:32 -0700

villebro commented on issue #5827: [WIP] Improve column/alias handling for case 
insensitive engines
URL: 
https://github.com/apache/incubator-superset/pull/5827#issuecomment-420526165
 
 
   @mistercrunch I agree that this might seem excessive, but let me explain the 
reasoning behind the changes:
   
   The main argument is that `viz.py` should be independent of database type. 
In order to achieve this, any column names or labels in `query_obj` need to be 
in their original format. Let's assume that we have a metric with the label 
`SUM(col)`. In the case of Redshift and BigQuery these would cause the 
following:
   - **Redshift**: the resulting dataframe will contain a column called 
`sum(col)` (gets automatically mutated by the database)
   - **Bigquery**: will fail unless the label is mutated.
   
   Currently (in `master`) BigQuery solves this by replacing the parens with 
underscores, resulting in a column called `SUM_col_`. Aside from the mostly 
theoretical risk of collisions, this introduces a discrepancy between the 
dataframe and `form_data`/`query_obj`. Furthermore, the mapping from metric 
name to verbose name used by all charts doesn't work (`data.verbose_map` in 
`/connectors/base/models.py`) unless that is also made aware of the new mutated 
labels. The way I see it there are two ways around this:
   1) Either the columns in the dataframe returned by 
`/connectors/sqla/models.py` need to be renamed to their original state prior 
to being passed to their respective Viz, or
   2) The Viz need to be aware that some labels have changed (in this case 
`SUM_col_` actually refers to `SUM(col)`).
   
   What this PR attempts to do is move from 2) to 1), i.e. encapsulate all SQLA 
specific logic in `/connectors/sqla/models.py`. In this proposal this has been 
done by pushing around a dict that collects all mutated labels in a dict, and 
renames the dataframe columns to their original state prior to being returned. 
I agree that this looks clumsy, but seemed like the best solution at the time. 
This can probably be refactored to something more maintainable/understandable. 
Where this approach adds complexity to SQLA models logic, this decouples Viz 
logic completely from the database backend, which I think is a good thing.
   
   While it might appear excessively complicated, I think the heterogeneous 
nature of the SQLA ecosystem seems to require a lot of flexibility from the 
backend to be able to conform to the quirks of every individual engine. 
However, if this still feels like the wrong approach I am open to suggestions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] villebro commented on issue #5827: [WIP] Improve column/alias handling for case insensitive engines

Reply via email to