abhishekrb19 opened a new issue, #14346: URL: https://github.com/apache/druid/issues/14346
### Description Expose Druid functions and operators programmatically via the Druid SQL interface. ### Motivation: - As a Druid user, I want to be able to programmatically retrieve metadata about registered functions so that I can inspect the system and write automation/tooling that deals with various functions. - As a Druid client, I want to be able to retrieve the full list of available functions from a Druid cluster (even from extensions) so I don't have to write code to deal with new functions, types, etc. For example, the Druid web console relies on the [documented functions](https://github.com/apache/druid/blob/f6a0888bc07396a4afab91f5cb576793e26275d2/web-console/script/create-sql-docs.js#L66) to parse them at build time to provide suggestions at runtime. However, this information can be incomplete as new functions can be loaded through extensions at runtime. ### Design: Add a new table `INFORMATION_SCHEMA.ROUTINES` that exposes SQL functions and operators. The `INFORMATION_SCHEMA.ROUTINES` table will include the following columns: 1. ROUTINE_CATALOG: Name of the database. This will always be set to `druid`. 2. ROUTINE_SCHEMA: Name of the schema. This will always be set to `INFORMATION_SCHEMA`. 3. ROUTINE_NAME: Name of the function or operator. E.g., `APPROX_COUNT_DISTINCT_DS_THETA` 4. ROUTINE_TYPE: Type of routine - `FUNCTION`, `FUNCTION_STAR`, `POSTFIX`, etc. 5. IS_DETERMINISTIC: Is the routine deterministic. Returns `YES` for deterministic; `NO` if the routine is nondeterministic. 6. IS_AGGREGATOR: Is the routine an aggregator or not. Returns `YES` for aggregator functions; `NO` for scalar functions. 7. SIGNATURES: Possible signatures for the routine as a string. Note that Druid-specific columns such as `IS_AGGREGATOR`, `SIGNATURES` are also included besides the standard set. **Example usage:** To see information about all the aggregator functions, including ones loaded from extensions, run the following SQL query: ```sql SELECT "ROUTINE_NAME", "IS_DETERMINISTIC", "SIGNATURES" FROM "INFORMATION_SCHEMA"."ROUTINES" WHERE "ROUTINE_TYPE" = 'FUNCTION' AND 'IS_AGGREGATOR' = 'YES' ``` **Other alternatives:** Custom `sys` tables were considered. However, `INFORMATION_SCHEMA.ROUTINES` is the SQL standard to expose stored procedures, routines, and built-in types. ### Implementation sketch: Calcite knows about the registered operators, including the ones registered in extensions. So extracting this information from Calcite would be the best way. Therefore, we can use the `DruidOperatorTable` that implements Calcite's operator table interface. The `DruidOperatorTable` is aware of all the registered operators at runtime, so we can wire this up into the new `INFORMATION_SCHEMA` table. ### Future work: In addition to the above columns, I think we could also add more columns over time, including: 1. Extension/module name 2. Return type 3. Description of the routine, if available. Getting this information may involve adding more functionality to `DruidOperatorTable`. Note that this proposal is only for the Druid functions and operators. Exposing data types would be a separate proposal on its own. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
