villebro commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017459122


   Ok, so this all stems from the design decision Oracle-like databases have 
made, which I generally call "lowercase means uppercase means caseless", which 
IMO is a terrible assumption (I _really_ wish Snowflake hadn't gone with this 
design). I may be off on some details, but basically it boils down to this: 
When you write
   
   ```sql
   select column
   from table
   ```
   
   and the column name is caseless (=stored in metadata as UPPERCASE), the 
resulting column name in the query result is `COLUMN`. You get the same result 
by writing (correct me if I misremember these)
   
   ```sql
   select Column
   from table
   ```
   
   or
   
   ```sql
   select COLUMN
   from table
   ```
   
   or even
   
   ```sql
   select "COLUMN"
   from table
   ```
   
   but an error when writing
   
   ```sql
   select "column"
   from table
   ```
   
   because the column name isn't defined case-sensitively as "column", but 
rather stored in the metadata as "COLUMN" (=caseless). However, when you check 
the column name using `inspector.get_columns`, the column shows up as `column`, 
i.e. lowercase, which kind of isn't really true using Oracle logic, but ends up 
working in Superset. The Snowflake dialect does these to/from conversions here: 
https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217-L234
 , and these functions kind of make sense when you know the assumptions behind 
case handling, but is super confusing when you're trying to write code that 
handles the cases correctly.
   
   Now, we actually have a method in `BaseEngineSpec` called `get_column_spec` 
which takes the source of the column definition as one argument: 
https://github.com/apache/superset/blob/035638c95869d0d4b7c9d44f13e88e3535ecdf4c/superset/db_engine_specs/base.py#L1313-L1320
   What we could do here is add a field `normalized_name` to the `ColumnSpec` 
type and return the normalized column name on affected dbs (Oracle, Snowflake 
et) if the column metadata originates from a query (=virtual table), but leave 
it unchanged for physical tables. This would just need to be implemented in the 
SQLAlchemy model and shouldn't be lots of work, but it would be really 
important to do super thorough testing to make sure it handles all cases 
correctly (caseless, UPPERCASE, "lowercase" and "Mixed Case").
   
   This probably sounds really confusing, so maybe it makes sense to have a 
kickoff call to get this work started to make sure we fully understand the 
problem and have a clear proposal for solving this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to