[GitHub] [druid] abhishekrb19 opened a new issue, #14346: Expose Druid functions via `INFORMATION_SCHEMA.ROUTINES` table.

via GitHub Sun, 28 May 2023 21:22:35 -0700


abhishekrb19 opened a new issue, #14346:
URL: https://github.com/apache/druid/issues/14346

### Description

Expose Druid functions and operators programmatically via the Druid SQL
interface.

### Motivation:
- As a Druid user, I want to be able to programmatically retrieve metadata
about registered functions so that I can inspect the system and write
automation/tooling that deals with various functions.
- As a Druid client, I want to be able to retrieve the full list of
available functions from a Druid cluster (even from extensions) so I don't have
to write code to deal with new functions, types, etc. For example, the Druid
web console relies on the [documented
functions](https://github.com/apache/druid/blob/f6a0888bc07396a4afab91f5cb576793e26275d2/web-console/script/create-sql-docs.js#L66)
to parse them at build time to provide suggestions at runtime. However, this
information can be incomplete as new functions can be loaded through extensions
at runtime.

### Design:
Add a new table `INFORMATION_SCHEMA.ROUTINES` that exposes SQL functions and
operators. The `INFORMATION_SCHEMA.ROUTINES` table will include the following
columns:
1. ROUTINE_CATALOG: Name of the database. This will always be set to `druid`.
2. ROUTINE_SCHEMA: Name of the schema. This will always be set to
`INFORMATION_SCHEMA`.
3. ROUTINE_NAME: Name of the function or operator. E.g.,
`APPROX_COUNT_DISTINCT_DS_THETA`
4. ROUTINE_TYPE: Type of routine - `FUNCTION`, `FUNCTION_STAR`, `POSTFIX`,
etc.
5. IS_DETERMINISTIC: Is the routine deterministic. Returns `YES` for
deterministic; `NO` if the routine is nondeterministic.
6. IS_AGGREGATOR: Is the routine an aggregator or not. Returns `YES` for
aggregator functions; `NO` for scalar functions.
7. SIGNATURES: Possible signatures for the routine as a string.

Note that Druid-specific columns such as `IS_AGGREGATOR`, `SIGNATURES` are
also included besides the standard set.

**Example usage:**
To see information about all the aggregator functions, including ones loaded
from extensions, run the following SQL query:

```sql
SELECT "ROUTINE_NAME", "IS_DETERMINISTIC", "SIGNATURES"
FROM "INFORMATION_SCHEMA"."ROUTINES"
WHERE "ROUTINE_TYPE" = 'FUNCTION' AND 'IS_AGGREGATOR' = 'YES'
```

**Other alternatives:** Custom `sys` tables were considered. However,
`INFORMATION_SCHEMA.ROUTINES` is the SQL standard to expose stored procedures,
routines, and built-in types.

### Implementation sketch:
Calcite knows about the registered operators, including the ones registered
in extensions. So extracting this information from Calcite would be the best
way. Therefore, we can use the `DruidOperatorTable` that implements Calcite's
operator table interface. The `DruidOperatorTable` is aware of all the
registered operators at runtime, so we can wire this up into the new
`INFORMATION_SCHEMA` table.

### Future work:
In addition to the above columns, I think we could also add more columns
over time, including:
1. Extension/module name
2. Return type
3. Description of the routine, if available.

Getting this information may involve adding more functionality to
`DruidOperatorTable`.

Note that this proposal is only for the Druid functions and operators.
Exposing data types would be a separate proposal on its own.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] abhishekrb19 opened a new issue, #14346: Expose Druid functions via `INFORMATION_SCHEMA.ROUTINES` table.

Reply via email to