paul-rogers opened a new pull request, #13777: URL: https://github.com/apache/druid/pull/13777
DruidDruid has an `INFORMATION_SCHEMA` to provide metadata about Druid tables and columns. Historically, columns that contain numbers (such as `ORDINAL_POSITION`) were represented as `VARCHAR`. This seems to be due to an implementation limitation. Some numeric columns (such as `NUMERIC_PRECISION`) are valid only some of the time, and require SQL `NULL` values other times. The schema of the information schema columns was describe using Druid types, and Druid does not seem to provide a nullable `BIGINT` column. Thus, the only way to set `NUMERIC_PRECISION` to `NULL` is to define it as a Druid `string`.) While this works, it has the unfortunate result of sorting incorrectly. Consider the following query: ```sql SELECT ORDINAL_POSITION, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'foo' ORDER BY ORDINAL_POSITION ``` If the table has more than ten columns, the ordering of the ordinals is 1, 11, 12, 2, ... because "11" sorts as a string before "2". Needless to say, this is quite surprising. It makes it hard for a client to obtain columns in ordinal order. (The workaround is to _not_ sort since the columns are produced in the correct order, but this is a somewhat silly workaround.) The solution is to use SQL types, not Druid types, to declare the table schema. After all, the information schema tables are processed by the Linq4j engine, which understands SQL types. We had to (incorrectly) translate the Druid types to SQL. After making this change, the ordinals do, in fact, sort in numeric order. In testing this change, I noticed that Druid numbers columns counting from 0. However, SQL prefers counting from 1. So, this change also modifies the column ordinals to start with 1. Updated the documentation to use SQL types (not Druid types) in the description of the system tables. These tables can only be accessed via SQL and thus only the SQL types are relevant. #### Release note The affect of this change is to make the `INFORMATION_SCHEMA` tables more SQL-like. If a client was depending on numeric fields being strings, then that client may produce unexpected results. Similarly, if a client was expecting columns to be numbered from 0, then the new 1-based numbering may impact the client. Otherwise, clients will find that queries against `INFORMATION_SCHEMA` tables act more like a user would expect: more like they act in other databases. <hr> This PR has: - [X] been self-reviewed. - [X] added documentation for new or modified features or behaviors. - [X] a release note entry in the PR description. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
