abhishekrb19 opened a new issue, #14346:
URL: https://github.com/apache/druid/issues/14346

   ### Description
   
   Expose Druid functions and operators programmatically via the Druid SQL 
interface.
   
   ### Motivation:
   - As a Druid user, I want to be able to programmatically retrieve metadata 
about registered functions so that I can inspect the system and write 
automation/tooling that deals with various functions.
   - As a Druid client, I want to be able to retrieve the full list of 
available functions from a Druid cluster (even from extensions) so I don't have 
to write code to deal with new functions, types, etc. For example, the Druid 
web console relies on the [documented 
functions](https://github.com/apache/druid/blob/f6a0888bc07396a4afab91f5cb576793e26275d2/web-console/script/create-sql-docs.js#L66)
 to parse them at build time to provide suggestions at runtime. However, this 
information can be incomplete as new functions can be loaded through extensions 
at runtime.
   
   
   
   ### Design:
   Add a new table `INFORMATION_SCHEMA.ROUTINES` that exposes SQL functions and 
operators.  The `INFORMATION_SCHEMA.ROUTINES` table will include the following 
columns:
   1. ROUTINE_CATALOG: Name of the database. This will always be set to `druid`.
   2. ROUTINE_SCHEMA: Name of the schema. This will always be set to 
`INFORMATION_SCHEMA`.
   3. ROUTINE_NAME: Name of the function or operator. E.g., 
`APPROX_COUNT_DISTINCT_DS_THETA`
   4. ROUTINE_TYPE:  Type of routine - `FUNCTION`, `FUNCTION_STAR`, `POSTFIX`, 
etc.
   5. IS_DETERMINISTIC:  Is the routine deterministic. Returns `YES` for 
deterministic; `NO` if the routine is nondeterministic.
   6. IS_AGGREGATOR: Is the routine an aggregator or not. Returns `YES` for 
aggregator functions; `NO` for scalar functions.
   7. SIGNATURES: Possible signatures for the routine as a string.
   
   Note that Druid-specific columns such as `IS_AGGREGATOR`, `SIGNATURES` are 
also included besides the standard set.
   
   **Example usage:**
   To see information about all the aggregator functions, including ones loaded 
from extensions, run the following SQL query:
   
   ```sql
   SELECT "ROUTINE_NAME", "IS_DETERMINISTIC", "SIGNATURES"
   FROM "INFORMATION_SCHEMA"."ROUTINES"
   WHERE "ROUTINE_TYPE" = 'FUNCTION' AND 'IS_AGGREGATOR' = 'YES'
   ```
   
   **Other alternatives:** Custom `sys` tables were considered. However, 
`INFORMATION_SCHEMA.ROUTINES` is the SQL standard to expose stored procedures, 
routines, and built-in types.
   
   
   ### Implementation sketch:
   Calcite knows about the registered operators, including the ones registered 
in extensions. So extracting this information from Calcite would be the best 
way.  Therefore, we can use the `DruidOperatorTable` that implements Calcite's 
operator table interface. The `DruidOperatorTable` is aware of all the 
registered operators at runtime, so we can wire this up into the new 
`INFORMATION_SCHEMA` table.
   
   ### Future work:
   In addition to the above columns, I think we could also add more columns 
over time, including:
   1. Extension/module name
   2. Return type
   3. Description of the routine, if available.
    
   Getting this information may involve adding more functionality to 
`DruidOperatorTable`.
   
   
   
   Note that this proposal is only for the Druid functions and operators. 
Exposing data types would be a separate proposal on its own.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to