Re: [Firebird-devel] User-defined aggregate functions

Adriano dos Santos Fernandes Sat, 29 Sep 2018 05:03:31 -0700

Mark, saying that an object-based interface is better to store state or the
way to support external routines is disregard how good Firebird selectable
procedures are.


Selectable procedures stores intermediary states in a very elegant way.

It's also the way modern languages implemented the same concept recently,
say JavaScript (ecmascript) and C# with yield.

My propose is to use the same elegant concept for aggregation. Your example
seems much more confusing. It makes sense thinking on interfaces, but very
different from Firebird way of doing things.

Also, it does not use pass-by-reference concept. It uses underlying
Firebird mechanism of messages, the same used for initial parameters values
and return values. PSQL syntax may be changed to.make it more clear that
it's not a parameter by reference.

External procedures are already supported with intermediary states. I'm
sure the interface for external aggregate function would be different than
standard functions and procedures, but not very different. Just that, each
functionality with their required interfaces.

For grouping set support, it's something I must investigate better. I
sometimes use it in Oracle, but did not thought about it in this spec or
even for ours internal aggregate functions.


Adriano


On Sat, Sep 29, 2018, 04:54 Mark Rotteveel <m...@lawinegevaar.nl> wrote:

> On 28-9-2018 18:34, Adriano dos Santos Fernandes wrote:
> > Hi!
> >
> > I propose creation of user-defined aggregate functions.
> >
> > Lets initially define the requirements for it:
> > - It should work with empty data set too, in case of use without GROUP
> > BY clause: i.e., COUNT results in 0, while most others functions result
> > in NULL
> > - It should also produce "running (or accumulated) aggregation", like
> > SUM(...) OVER (ORDER BY ...)
> >
> > What is not proposed here:
> > - Make a single function work with different source/target data types.
> > Custom aggregate function is statically typed as normal functions.
> >
> > Design is open if good alternatives are presented. Initially I think:
> >
> > - There should be one input parameter and one output parameter.
> > - SUSPEND is used to make engine feed the input parameter with a new
> > value, setting AGG_FINISHED to TRUE/FALSE, and returning running
> > (accumulated) aggregation value.
> > - Empty data set makes the function already starts with AGG_FINISHED =
> > TRUE, and non empty data set makes the function starts with AGG_FINISHED
> > = FALSE and the first row already fed in the input parameter
> > - When AGG_FINISHED is TRUE, it's expected that user code does not call
> > SUSPEND anymore, so only for the empty data set case, the value present
> > in the output parameter when the function ends is the result of the
> > aggregate function in the query.
>
> I think the syntax below is rather verbose, and I wonder if it can
> easily be used to support all types of aggregates.
>
> It introduces some additional behavior and syntax that hasn't surfaced
> before in PSQL, like:
>
> - a pass by reference call mechanism (the input parameter) to pump new
> values in, this may make it harder to use UDRs for this (not all
> languages (easily) support such call mechanisms)
> - a named output parameter for a function, which breaks syntax
> consistency with normal stored functions
> - an externally controlled context value (agg_finished), again something
> that I don't can be easily supported in UDR
>
> Although I haven't used it, I think something like PostgreSQL's
> aggregate (see https://www.postgresql.org/docs/10/static/xaggr.html and
> https://www.postgresql.org/docs/10/static/sql-createaggregate.html) may
> be simpler, but that may require the introduction of some sort of record
> type (unless arrays are 'good enough') to be able to maintain and pass
> more complex state. There are also some other 'bears' their as it relies
> on PostgreSQLs object support.
>
> We could also consider to have some aggregate object that has state and
> contains the functions that the engine calls for calculating the
> aggregate. In a way, this aggregate would implement an interface
> required by the engine.
>
> That interface consists of a procedure state_transition to accept the
> value(s) of the aggregate, and modifies the internal state, and a
> function final_state that returns the aggregate value.
>
> The engine creates an instance of the aggregate, and calls the
> state_transition to add values, and calls the final_state whenever it
> needs the final result.
>
> The sum example
>
> create aggregate custom_sum(i bigint) returns bigint
> as
>     declare summed bigint = 0;
> begin
>     procedure state_transition(i bigint)
>     as
>     begin
>       if (is is null) then leave;
>       summed = summed + i;
>     end
>
>     function final_state returns bigint
>     as
>       return summed;
>     end
> end;
>
> The average example. Be aware that this implementation (like yours) will
> have a precision problem for a large number of small values, and may be
> problematic for large values.
>
> create aggregate custom_avg(i double precision) returns double precision
> as
>    declare current_count integer = 0;
>    declare accumulated double precision = 0;
> begin
>    procedure state_transition(i double precision)
>    as
>    begin
>      if (is is null) then leave;
>      current_count = current_count + 1;
>      accumulated = accumulated + i;
>    end
>
>    function final_state returns double precision
>      return accumulated / current_count;
>    end
> end
>
> Your final example (with a change to not count null):
>
> create aggregate custom_count_plus_1000 (i integer) returns integer
> as
>    declare current_count integer;
> begin
>    procedure state_transition(i integer)
>    as
>    begin
>      if (current_count is null) then
>        current_count = 0;
>      if (is is null) then leave;
>      current_count = current_count + 1;
>    end
>
>    function final_state returns integer
>    as
>    begin
>      if (current_count is null) then
>         return 1000;
>      else
>         return current_count;
>    end
> end
>
> This may be extended to also allow support for moving aggregates with
> window functions (see "Moving-Aggregate Mode" in the PostgreSQL docs) by
> having additional procedures and functions.
>
> However it looks like the way PostgreSQL passes state between functions
> may actually be better, as it would probably be easier for supporting
> grouping sets and other aggregations happening on multiple levels.
>
> Taking the avg example again, such a form could look like:
>
> create aggregate custom_avg(i double precision) returns double precision
> as
> begin
>    record_type custom_avg_record (current_count integer, accumulated
> double precision)
>
>    function initial_state returns custom_avg_record
>    as
>    begin
>      return custom_avg_record(0, 0);
>    end
>
>    function state_transition(custom_avg_record previous_state, i double
> precision) returns custom_avg_record
>    as
>    begin
>      if (is is null) then
>        return previous_state;
>
>      return custom_avg_record(
>          previous_state.current_count + 1,
>          previous_state.accumulated + i);
>    end
>
>    function final_state(custom_avg_record previous_state) returns double
> precision
>      return previous_state.accumulated / previous_state.current_count;
>    end
> end
>
> Record type syntax etc just made up, would need more fleshing out (and
> possibly aligned with supporting row values).
>
> > Below I put some example (tested mentally only) functions:
> >
> >
> > ------------------
> > -- Works as standard SUM.
> > create aggregate function custom_sum (i integer) returns (o integer)
> > as
> > begin
> >      while (not agg_finished)
> >      do
> >      begin
> >          if (i is not null) then
> >          begin
> >              if (o is null) then
> >                  o = 0;
> >
> >              o = o + i;
> >          end
> >
> >          suspend;
> >      end
> > end
> >
> >
> > -- Works as standard AVG.
> > create aggregate function custom_avg (i double precision) returns (o
> > double precision)
> > as
> >      declare count integer = 0;
> >      declare accumulated double precision = 0;
> > begin
> >      while (not agg_finished)
> >      do
> >      begin
> >          if (i is not null) then
> >          begin
> >              count = count + 1;
> >              accumulated = accumulated + i;
> >              o = accumulated / count;
> >          end
> >
> >          suspend;
> >      end
> > end
> >
> >
> > -- Works as standard COUNT.
> >   create aggregate function custom_count (i integer) returns (o integer)
> > as
> > begin
> >      o = 0;
> >
> >      while (not agg_finished)
> >      do
> >      begin
> >          if (i is not null) then
> >              o = o + 1;
> >          suspend;
> >      end
> > end
> >
> >
> > -- This function shows the difference of returning value in SUSPEND when
> > data set is not empty and returning in function termination when data
> > set is empty.
> > -- select custom_count_plus_1000(1) from rdb$database -- returns 1
> > -- select custom_count_plus_1000(1) from rdb$database where 1 = 0 --
> > returns 1000
> > create aggregate function custom_count_plus_1000 (i integer) returns (o
> > integer)
> > as
> > begin
> >      o = 0;
> >
> >      while (not agg_finished)
> >      do
> >      begin
> >          o = o + 1;
> >          suspend;
> >      end
> >
> >      o = o + 1000;
> > end
>
> --
> Mark Rotteveel
>
>
> Firebird-Devel mailing list, web interface at
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>

Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] User-defined aggregate functions

Reply via email to