[GitHub] [superset] rumbin opened a new issue, #19773: [SIP-82] Case-insensitive handling of datasets' column names

GitBox Tue, 19 Apr 2022 03:05:07 -0700


rumbin opened a new issue, #19773:
URL: https://github.com/apache/superset/issues/19773


   
   !!! warning "This document is still WIP; review of @villebro, 
@agusfigueroa-htg is required."
   
   -------------
   
   
   ## [SIP-82] Case-insensitive handling of datasets' column names
   
   ### Motivation
   
   The default case (upper/lower) and case sensitivity of object names 
(schemas, tables, columns,...) is handled very differently in the various 
DSMSes that are supported by Superset.  
   E.g., Postgres interprets unquoted column names as lowercase while Oracle 
and Snowflake treat them as UPPERCASE.
   
   Superset is currently not consistently treating the case of column names. As 
a result, _virtual_ datasets of an UPPERCASE DB like Snowflake are represented 
in UPPERCASE, while _physical_ datasets of these DBs have lowercase column 
names in Superset.
   See #18085 for more details and discussion on how this ought to be fixed.
   
   The main issues, that arise from this inconsistency, are:
   
   1. Dashboard filters refer to case-sensitive representations of the columns. 
If a dashboard contains charts that are based on physical _and_ virtual 
datasets, the filters will only be applied to the  ones there the case of the 
column name matches.
   2. If a physical dataset is later on changed to become a virtual dataset (or 
vice versa), the case of the column names changes and existing charts and 
filters will be harmed. Such changes are pretty common, e.g., when a virtual 
dataset is promoted to become a view in the database or when an existing table 
needs some more logic applied (e.g. filtering of soft-deleted records).
   3. Migration of the data warehouse system — e.g. from Postgres to Snowflake, 
while reproducing the data marts — will cause the column names to potentially 
change in case, thus breaking existing charts.
   
   
   ### Proposed Change
   
   In order to find a database-agnostic solution which dows not require 
upstream changes on SQLAlchemy drivers, this issue my best be tackled by making 
Superset handle column names _case-insensitively_. I.e., all columns should 
_internally_ be treated in lowercase.
   
   There is a small risk of datasets having two columns that would translate to 
the same case-insensitive (lowercase) representation of the column name. 
However, @villebro 
[feels](https://github.com/apache/superset/issues/18085#issuecomment-1102213239)
 that only very few people would really have a need to distinguish columns 
based on their case.
   
   However, we need to ensure that e.g. CamelCase column names keep their human 
readability. Thus, I suggest to auto-fill the `label` of the dataset column 
(a.k.a. the `verbose_name`) with it's original, case-sensitive, name in cases 
where this field is not already filled (do not overwrite existing information).
   
   ### New or Changed Public Interfaces
   
   @villebro, @agusfigueroa-htg - I need your input regarding this and the 
following sections of this SIP...
   
   
   ### New dependencies
   
   
   ### Migration Plan and Compatibility
   
   
   ### Rejected Alternatives
   
   Consistency could be introduced on a per-DBMS basis, i.e. per SQLAlchemy 
driver, so all datasets in UPPERCASE DBMSes would be represented in UPPERCASE, 
regardless of whether they are physical or virtual datasets. Thsi would fix the 
aforementioned issues 1 and 2. However, the 3rd issue would not be covered. 
Furthermore, this issue may be more error-prone, when introducing support for 
more DBMSes or when upstream changes occur.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] rumbin opened a new issue, #19773: [SIP-82] Case-insensitive handling of datasets' column names

Reply via email to