On 28-9-2018 18:34, Adriano dos Santos Fernandes wrote:
Hi!
I propose creation of user-defined aggregate functions.
Lets initially define the requirements for it:
- It should work with empty data set too, in case of use without GROUP
BY clause: i.e., COUNT results in 0, while most others functions result
in NULL
- It should also produce "running (or accumulated) aggregation", like
SUM(...) OVER (ORDER BY ...)
What is not proposed here:
- Make a single function work with different source/target data types.
Custom aggregate function is statically typed as normal functions.
Design is open if good alternatives are presented. Initially I think:
- There should be one input parameter and one output parameter.
- SUSPEND is used to make engine feed the input parameter with a new
value, setting AGG_FINISHED to TRUE/FALSE, and returning running
(accumulated) aggregation value.
- Empty data set makes the function already starts with AGG_FINISHED =
TRUE, and non empty data set makes the function starts with AGG_FINISHED
= FALSE and the first row already fed in the input parameter
- When AGG_FINISHED is TRUE, it's expected that user code does not call
SUSPEND anymore, so only for the empty data set case, the value present
in the output parameter when the function ends is the result of the
aggregate function in the query.
I think the syntax below is rather verbose, and I wonder if it can
easily be used to support all types of aggregates.
It introduces some additional behavior and syntax that hasn't surfaced
before in PSQL, like:
- a pass by reference call mechanism (the input parameter) to pump new
values in, this may make it harder to use UDRs for this (not all
languages (easily) support such call mechanisms)
- a named output parameter for a function, which breaks syntax
consistency with normal stored functions
- an externally controlled context value (agg_finished), again something
that I don't can be easily supported in UDR
Although I haven't used it, I think something like PostgreSQL's
aggregate (see https://www.postgresql.org/docs/10/static/xaggr.html and
https://www.postgresql.org/docs/10/static/sql-createaggregate.html) may
be simpler, but that may require the introduction of some sort of record
type (unless arrays are 'good enough') to be able to maintain and pass
more complex state. There are also some other 'bears' their as it relies
on PostgreSQLs object support.
We could also consider to have some aggregate object that has state and
contains the functions that the engine calls for calculating the
aggregate. In a way, this aggregate would implement an interface
required by the engine.
That interface consists of a procedure state_transition to accept the
value(s) of the aggregate, and modifies the internal state, and a
function final_state that returns the aggregate value.
The engine creates an instance of the aggregate, and calls the
state_transition to add values, and calls the final_state whenever it
needs the final result.
The sum example
create aggregate custom_sum(i bigint) returns bigint
as
declare summed bigint = 0;
begin
procedure state_transition(i bigint)
as
begin
if (is is null) then leave;
summed = summed + i;
end
function final_state returns bigint
as
return summed;
end
end;
The average example. Be aware that this implementation (like yours) will
have a precision problem for a large number of small values, and may be
problematic for large values.
create aggregate custom_avg(i double precision) returns double precision
as
declare current_count integer = 0;
declare accumulated double precision = 0;
begin
procedure state_transition(i double precision)
as
begin
if (is is null) then leave;
current_count = current_count + 1;
accumulated = accumulated + i;
end
function final_state returns double precision
return accumulated / current_count;
end
end
Your final example (with a change to not count null):
create aggregate custom_count_plus_1000 (i integer) returns integer
as
declare current_count integer;
begin
procedure state_transition(i integer)
as
begin
if (current_count is null) then
current_count = 0;
if (is is null) then leave;
current_count = current_count + 1;
end
function final_state returns integer
as
begin
if (current_count is null) then
return 1000;
else
return current_count;
end
end
This may be extended to also allow support for moving aggregates with
window functions (see "Moving-Aggregate Mode" in the PostgreSQL docs) by
having additional procedures and functions.
However it looks like the way PostgreSQL passes state between functions
may actually be better, as it would probably be easier for supporting
grouping sets and other aggregations happening on multiple levels.
Taking the avg example again, such a form could look like:
create aggregate custom_avg(i double precision) returns double precision
as
begin
record_type custom_avg_record (current_count integer, accumulated
double precision)
function initial_state returns custom_avg_record
as
begin
return custom_avg_record(0, 0);
end
function state_transition(custom_avg_record previous_state, i double
precision) returns custom_avg_record
as
begin
if (is is null) then
return previous_state;
return custom_avg_record(
previous_state.current_count + 1,
previous_state.accumulated + i);
end
function final_state(custom_avg_record previous_state) returns double
precision
return previous_state.accumulated / previous_state.current_count;
end
end
Record type syntax etc just made up, would need more fleshing out (and
possibly aligned with supporting row values).
Below I put some example (tested mentally only) functions:
------------------
-- Works as standard SUM.
create aggregate function custom_sum (i integer) returns (o integer)
as
begin
while (not agg_finished)
do
begin
if (i is not null) then
begin
if (o is null) then
o = 0;
o = o + i;
end
suspend;
end
end
-- Works as standard AVG.
create aggregate function custom_avg (i double precision) returns (o
double precision)
as
declare count integer = 0;
declare accumulated double precision = 0;
begin
while (not agg_finished)
do
begin
if (i is not null) then
begin
count = count + 1;
accumulated = accumulated + i;
o = accumulated / count;
end
suspend;
end
end
-- Works as standard COUNT.
create aggregate function custom_count (i integer) returns (o integer)
as
begin
o = 0;
while (not agg_finished)
do
begin
if (i is not null) then
o = o + 1;
suspend;
end
end
-- This function shows the difference of returning value in SUSPEND when
data set is not empty and returning in function termination when data
set is empty.
-- select custom_count_plus_1000(1) from rdb$database -- returns 1
-- select custom_count_plus_1000(1) from rdb$database where 1 = 0 --
returns 1000
create aggregate function custom_count_plus_1000 (i integer) returns (o
integer)
as
begin
o = 0;
while (not agg_finished)
do
begin
o = o + 1;
suspend;
end
o = o + 1000;
end
--
Mark Rotteveel
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel