Re: [Firebird-devel] User-defined aggregate functions

Mark Rotteveel Sat, 29 Sep 2018 00:55:10 -0700

On 28-9-2018 18:34, Adriano dos Santos Fernandes wrote:

Hi!


I propose creation of user-defined aggregate functions.

Lets initially define the requirements for it:
- It should work with empty data set too, in case of use without GROUP
BY clause: i.e., COUNT results in 0, while most others functions result
in NULL
- It should also produce "running (or accumulated) aggregation", like
SUM(...) OVER (ORDER BY ...)

What is not proposed here:
- Make a single function work with different source/target data types.
Custom aggregate function is statically typed as normal functions.

Design is open if good alternatives are presented. Initially I think:

- There should be one input parameter and one output parameter.
- SUSPEND is used to make engine feed the input parameter with a new
value, setting AGG_FINISHED to TRUE/FALSE, and returning running
(accumulated) aggregation value.
- Empty data set makes the function already starts with AGG_FINISHED =
TRUE, and non empty data set makes the function starts with AGG_FINISHED
= FALSE and the first row already fed in the input parameter
- When AGG_FINISHED is TRUE, it's expected that user code does not call
SUSPEND anymore, so only for the empty data set case, the value present
in the output parameter when the function ends is the result of the
aggregate function in the query.

I think the syntax below is rather verbose, and I wonder if it caneasily be used to support all types of aggregates.

It introduces some additional behavior and syntax that hasn't surfacedbefore in PSQL, like:

- a pass by reference call mechanism (the input parameter) to pump newvalues in, this may make it harder to use UDRs for this (not alllanguages (easily) support such call mechanisms)- a named output parameter for a function, which breaks syntaxconsistency with normal stored functions- an externally controlled context value (agg_finished), again somethingthat I don't can be easily supported in UDR

Although I haven't used it, I think something like PostgreSQL'saggregate (see https://www.postgresql.org/docs/10/static/xaggr.html andhttps://www.postgresql.org/docs/10/static/sql-createaggregate.html) maybe simpler, but that may require the introduction of some sort of recordtype (unless arrays are 'good enough') to be able to maintain and passmore complex state. There are also some other 'bears' their as it relieson PostgreSQLs object support.

We could also consider to have some aggregate object that has state andcontains the functions that the engine calls for calculating theaggregate. In a way, this aggregate would implement an interfacerequired by the engine.

That interface consists of a procedure state_transition to accept thevalue(s) of the aggregate, and modifies the internal state, and afunction final_state that returns the aggregate value.

The engine creates an instance of the aggregate, and calls thestate_transition to add values, and calls the final_state whenever itneeds the final result.


The sum example

create aggregate custom_sum(i bigint) returns bigint
as
   declare summed bigint = 0;
begin
   procedure state_transition(i bigint)
   as
   begin
     if (is is null) then leave;
     summed = summed + i;
   end

   function final_state returns bigint
   as
     return summed;
   end
end;

The average example. Be aware that this implementation (like yours) willhave a precision problem for a large number of small values, and may beproblematic for large values.


create aggregate custom_avg(i double precision) returns double precision
as
  declare current_count integer = 0;
  declare accumulated double precision = 0;
begin
  procedure state_transition(i double precision)
  as
  begin
    if (is is null) then leave;
    current_count = current_count + 1;
    accumulated = accumulated + i;
  end

  function final_state returns double precision
    return accumulated / current_count;
  end
end

Your final example (with a change to not count null):

create aggregate custom_count_plus_1000 (i integer) returns integer
as
  declare current_count integer;
begin
  procedure state_transition(i integer)
  as
  begin
    if (current_count is null) then
      current_count = 0;
    if (is is null) then leave;
    current_count = current_count + 1;
  end

  function final_state returns integer
  as
  begin
    if (current_count is null) then
       return 1000;
    else
       return current_count;
  end
end

This may be extended to also allow support for moving aggregates withwindow functions (see "Moving-Aggregate Mode" in the PostgreSQL docs) byhaving additional procedures and functions.

However it looks like the way PostgreSQL passes state between functionsmay actually be better, as it would probably be easier for supportinggrouping sets and other aggregations happening on multiple levels.


Taking the avg example again, such a form could look like:

create aggregate custom_avg(i double precision) returns double precision
as
begin

record_type custom_avg_record (current_count integer, accumulateddouble precision)


  function initial_state returns custom_avg_record
  as
  begin
    return custom_avg_record(0, 0);
  end

function state_transition(custom_avg_record previous_state, i doubleprecision) returns custom_avg_record

  as
  begin
    if (is is null) then
      return previous_state;

    return custom_avg_record(
        previous_state.current_count + 1,
        previous_state.accumulated + i);
  end

function final_state(custom_avg_record previous_state) returns doubleprecision

    return previous_state.accumulated / previous_state.current_count;
  end
end

Record type syntax etc just made up, would need more fleshing out (andpossibly aligned with supporting row values).

Below I put some example (tested mentally only) functions:


------------------
-- Works as standard SUM.
create aggregate function custom_sum (i integer) returns (o integer)
as
begin
     while (not agg_finished)
     do
     begin
         if (i is not null) then
         begin
             if (o is null) then
                 o = 0;

             o = o + i;
         end

         suspend;
     end
end


-- Works as standard AVG.
create aggregate function custom_avg (i double precision) returns (o
double precision)
as
     declare count integer = 0;
     declare accumulated double precision = 0;
begin
     while (not agg_finished)
     do
     begin
         if (i is not null) then
         begin
             count = count + 1;
             accumulated = accumulated + i;
             o = accumulated / count;
         end

         suspend;
     end
end


-- Works as standard COUNT.
  create aggregate function custom_count (i integer) returns (o integer)
as
begin
     o = 0;

     while (not agg_finished)
     do
     begin
         if (i is not null) then
             o = o + 1;
         suspend;
     end
end


-- This function shows the difference of returning value in SUSPEND when
data set is not empty and returning in function termination when data
set is empty.
-- select custom_count_plus_1000(1) from rdb$database -- returns 1
-- select custom_count_plus_1000(1) from rdb$database where 1 = 0 --
returns 1000
create aggregate function custom_count_plus_1000 (i integer) returns (o
integer)
as
begin
     o = 0;

     while (not agg_finished)
     do
     begin
         o = o + 1;
         suspend;
     end

     o = o + 1000;
end


--
Mark Rotteveel


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] User-defined aggregate functions

Reply via email to