paul-rogers opened a new pull request, #13777:
URL: https://github.com/apache/druid/pull/13777

   DruidDruid has an `INFORMATION_SCHEMA` to provide metadata about Druid 
tables and columns. Historically, columns that contain numbers (such as 
`ORDINAL_POSITION`) were represented as `VARCHAR`. This seems to be due to an 
implementation limitation. Some numeric columns (such as `NUMERIC_PRECISION`) 
are valid only some of the time, and require SQL `NULL` values other times. The 
schema of the information schema columns was describe using Druid types, and 
Druid does not seem to provide a nullable `BIGINT` column. Thus, the only way 
to set `NUMERIC_PRECISION` to `NULL` is to define it as a Druid `string`.)
   
   While this works, it has the unfortunate result of sorting incorrectly. 
Consider the following query:
   
   ```sql
   SELECT ORDINAL_POSITION, COLUMN_NAME
   FROM INFORMATION_SCHEMA.COLUMNS
   WHERE TABLE_NAME = 'foo'
   ORDER BY ORDINAL_POSITION
   ```
   
   If the table has more than ten columns, the ordering  of the ordinals is 1, 
11, 12, 2, ... because "11" sorts as a string before "2". Needless to say, this 
is quite surprising. It makes it hard for a client to obtain columns in ordinal 
order. (The workaround is to _not_ sort since the columns are produced in the 
correct order, but this is a somewhat silly workaround.)
   
   The solution is to use SQL types, not Druid types, to declare the table 
schema. After all, the information schema tables are processed by the Linq4j 
engine, which understands SQL types. We had to (incorrectly) translate the 
Druid types to SQL.
   
   After making this change, the ordinals do, in fact, sort in numeric order.
   
   In testing this change, I noticed that Druid numbers columns counting from 
0. However, SQL prefers counting from 1. So, this change also modifies the 
column ordinals to start with 1.
   
   Updated the documentation to use SQL types (not Druid types) in the 
description of the system tables. These tables can only be accessed via SQL and 
thus only the SQL types are relevant.
   
   #### Release note
   
   The affect of this change is to make the `INFORMATION_SCHEMA` tables more 
SQL-like. If a client was depending on numeric fields being strings, then that 
client may produce unexpected results. Similarly, if a client was expecting 
columns to be numbered from 0, then the new 1-based numbering may impact the 
client. Otherwise, clients will find that queries against `INFORMATION_SCHEMA` 
tables act more like a user would expect: more like they act in other databases.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [X] a release note entry in the PR description.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to