Hi Mike, Good call — those are real, distinct categories, and the engine already separates them internally, so list_functions can label rather than guess:
- SQL++ core aggregates — the array_* (NULL/MISSING-ignoring) and strict_*/coll_* (NULL-preserving) functions that take a collection. These are the actual registered runtime functions. - SQL-92 aggregates (MIN, MAX, SUM, AVG, COUNT, STDDEV, VAR…) — these aren't separate functions. They're surface sugar that the SQL++ rewriter desugars over the GROUP BY group onto the core aggregates (SUM → array_sum, COUNT → array_count). FunctionMapUtil.isSql92AggregateFunction() / sql92ToCoreAggregateFunction() is the bridge. Worth flagging: enumerating the registry alone shows array_count, not COUNT — to list the names users actually type, we'd walk that mapping. - Window functions (ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, NTILE, FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST, RATIO_TO_REPORT) — own registry (isWindowFunction()), used with OVER. So the plan is to expose a category field on each function (scalar / aggregate-core / aggregate-sql / window / unnest), derived from the predicates the engine already has. That lets a client filter by category instead of us hardcoding names, and it keeps the SQL-92 vs core distinction explicit. Thanks for raising it — I'll fold these categories into the metadata design. Best, Vivek On Mon, 8 Jun 2026 at 12:08, Mike Carey <[email protected]> wrote: > Q: Should there also be a way to list and distinguish SQL++ aggregate > functions and SQL style weird aggregate “functions” (like MIN, MAX, SUM, > AVG, and COUNT…)? And what about SQL style window functions? Just > mentioning those categories too. > > Cheers, > Mike > > On Sun, Jun 7, 2026 at 10:35 PM Vivek Gangavarapu < > [email protected]> wrote: > > > Hi Ian, > > > > Thanks — that pointed me in the right direction. I dug through the source > > to map out a proper long-term fix instead of maintaining a hardcoded list > > in the MCP server, and I think the pieces line up cleanly. > > > > On why BuiltinFunctions.java is incomplete: I found the reason — the > > runtime-complete set lives in two places: > > > > 1. > > > > BuiltinFunctions.registeredFunctions (asterix-om) — the static > metadata, > > including rewrite-only/datasource functions like dataset and ping. > > 2. > > > > FunctionCollection.createDefaultFunctionCollection() > (asterix-runtime) — > > which, after adding the static scalar/aggregate factories, runs > > ServiceLoader.load(IFunctionRegistrant.class) and lets each module > > inject its functions. > > > > The geospatial ones come in here via GeoFunctionRegistrant, which is > > exactly why they're absent from the static file (fuzzy-join and the > runtime > > parsers do the same). So a complete listing is the union of those two, > > keyed by FunctionIdentifier. That handles the bootstrap-added functions > > automatically. > > > > *Proposed implementation:* A list_functions() datasource function, > > following the existing ping() pattern (Rewriter → Datasource → Function → > > Reader, registered in MetadataBuiltinFunctions). It would be queryable > and > > joinable: SELECT name, arity FROM list_functions() WHERE name LIKE > "%geo%"; > > > > *Phasing:* > > > > - > > > > *Phase 1 — name, dataverse, arity:* Arity is straightforward from > > FunctionIdentifier.getArity(), with a variadic flag for the VARARGS > (-1) > > case. > > - > > > > *Internal-function filter (default on):* I think the markers we need > > already exist: the isPrivate flag set by addPrivateFunction, plus the > > aggregateTo{Local,Intermediate,Global,Serializable}Aggregate maps > whose > > values are the internal partial-aggregate helpers. I'd exclude those > by > > default and add an include_internal flag to override. Does that match > > your sense of which functions should be hidden, or are there other > > categories I'm missing? > > - > > > > *Phase 2 — type restrictions (best-effort, later):* I agree this is > the > > hard part. The type info is in IResultTypeComputer, which is > > output-oriented and context-dependent — there's no static > > input-signature > > table to read. My plan is to leave it null/unknown initially rather > than > > block Phase 1 on it. > > > > One thing I'd like your read on: the reader runs in the function's > > execution context, so I need to confirm the cleanest way to reach the > live > > FunctionCollection and BuiltinFunctions registries from there (via > > MetadataProvider / app context). If there's an existing accessor you'd > > point me to, that'd save me some hunting. > > > > > > Thanks, > > > > Vivek > > >
