Hi Mike,

Good call — those are real, distinct categories, and the engine already
separates them internally, so list_functions can label rather than guess:


   -  SQL++ core aggregates — the array_* (NULL/MISSING-ignoring) and
   strict_*/coll_* (NULL-preserving) functions that take a collection. These
   are the actual registered runtime functions.
   - SQL-92 aggregates (MIN, MAX, SUM, AVG, COUNT, STDDEV, VAR…) — these
   aren't separate functions. They're surface sugar that the SQL++ rewriter
   desugars over the GROUP BY group onto the core aggregates (SUM → array_sum,
   COUNT → array_count). FunctionMapUtil.isSql92AggregateFunction() /
   sql92ToCoreAggregateFunction() is the bridge. Worth flagging: enumerating
   the registry alone shows array_count, not COUNT — to list the names users
   actually type, we'd walk that mapping.
   - Window functions (ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, NTILE,
   FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST,
   RATIO_TO_REPORT) — own registry (isWindowFunction()), used with OVER.


So the plan is to expose a category field on each function (scalar /
aggregate-core / aggregate-sql / window / unnest), derived from the
predicates the engine already has. That lets a client filter by category
instead of us hardcoding names, and it keeps the SQL-92 vs core distinction
explicit.

Thanks for raising it — I'll fold these categories into the metadata design.

Best,
Vivek

On Mon, 8 Jun 2026 at 12:08, Mike Carey <[email protected]> wrote:

> Q:  Should there also be a way to list and distinguish SQL++ aggregate
> functions and SQL style weird aggregate “functions” (like MIN, MAX, SUM,
> AVG, and COUNT…)?  And what about SQL style window functions?  Just
> mentioning those categories too.
>
> Cheers,
> Mike
>
> On Sun, Jun 7, 2026 at 10:35 PM Vivek Gangavarapu <
> [email protected]> wrote:
>
> > Hi Ian,
> >
> > Thanks — that pointed me in the right direction. I dug through the source
> > to map out a proper long-term fix instead of maintaining a hardcoded list
> > in the MCP server, and I think the pieces line up cleanly.
> >
> > On why BuiltinFunctions.java is incomplete: I found the reason — the
> > runtime-complete set lives in two places:
> >
> >    1.
> >
> >    BuiltinFunctions.registeredFunctions (asterix-om) — the static
> metadata,
> >    including rewrite-only/datasource functions like dataset and ping.
> >    2.
> >
> >    FunctionCollection.createDefaultFunctionCollection()
> (asterix-runtime) —
> >    which, after adding the static scalar/aggregate factories, runs
> >    ServiceLoader.load(IFunctionRegistrant.class) and lets each module
> >    inject its functions.
> >
> > The geospatial ones come in here via GeoFunctionRegistrant, which is
> > exactly why they're absent from the static file (fuzzy-join and the
> runtime
> > parsers do the same). So a complete listing is the union of those two,
> > keyed by FunctionIdentifier. That handles the bootstrap-added functions
> > automatically.
> >
> > *Proposed implementation:* A list_functions() datasource function,
> > following the existing ping() pattern (Rewriter → Datasource → Function →
> > Reader, registered in MetadataBuiltinFunctions). It would be queryable
> and
> > joinable: SELECT name, arity FROM list_functions() WHERE name LIKE
> "%geo%";
> >
> > *Phasing:*
> >
> >    -
> >
> >    *Phase 1 — name, dataverse, arity:* Arity is straightforward from
> >    FunctionIdentifier.getArity(), with a variadic flag for the VARARGS
> (-1)
> >    case.
> >    -
> >
> >    *Internal-function filter (default on):* I think the markers we need
> >    already exist: the isPrivate flag set by addPrivateFunction, plus the
> >    aggregateTo{Local,Intermediate,Global,Serializable}Aggregate maps
> whose
> >    values are the internal partial-aggregate helpers. I'd exclude those
> by
> >    default and add an include_internal flag to override. Does that match
> >    your sense of which functions should be hidden, or are there other
> >    categories I'm missing?
> >    -
> >
> >    *Phase 2 — type restrictions (best-effort, later):* I agree this is
> the
> >    hard part. The type info is in IResultTypeComputer, which is
> >    output-oriented and context-dependent — there's no static
> > input-signature
> >    table to read. My plan is to leave it null/unknown initially rather
> than
> >    block Phase 1 on it.
> >
> > One thing I'd like your read on: the reader runs in the function's
> > execution context, so I need to confirm the cleanest way to reach the
> live
> > FunctionCollection and BuiltinFunctions registries from there (via
> > MetadataProvider / app context). If there's an existing accessor you'd
> > point me to, that'd save me some hunting.
> >
> >
> > Thanks,
> >
> > Vivek
> >
>

Reply via email to