[jira] [Created] (CALCITE-4483) Add WITHIN DISTINCT clause for aggregate functions

Julian Hyde (Jira) Mon, 01 Feb 2021 11:31:18 -0800

Julian Hyde created CALCITE-4483:
------------------------------------

             Summary: Add WITHIN DISTINCT clause for aggregate functions
                 Key: CALCITE-4483
                 URL: https://issues.apache.org/jira/browse/CALCITE-4483
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde



Add a {{WITHIN DISTINCT}} clause to aggregate functions, allowing duplicate 
rows to be eliminated before entering the function.

This feature is non-standard, and in fact does not exist in any database I am 
aware of.

It is related to {{DISTINCT}}, and is in fact a generalization of it. 
{{DISTINCT}} can always be rewritten in terms of {{WITHIN DISTINCT}}. For 
example, {{SUM(DISTINCT sal)}} is equivalent to {{SUM(sal) WITHIN DISTINCT 
(sal)}}.

Consider the query
{noformat}
SELECT SUM(age),
   SUM(DISTINCT age),
  SUM(age) WITHIN DISTINCT (name)
FROM Friends{noformat}
where {{Friends}} has the rows
{noformat}
name   age job
====== === ==========
Julian  16 Programmer
Dick    15
Anne    13 Car wash
George  15 Lifeguard
George  15 Dog walker
Timmy    4
{noformat}
Note that there are two rows for George, because she has two jobs.

The values of the columns are as follows:
 * {{SUM(age)}} has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
 * {{SUM(DISTINCT age)}} has the value (16 + 15 + 13 + 4) = 48;
 * {{SUM(age) WITHIN DISTINCT (name)}} has the value (16 + 15 + 13 + 15 + 4) = 
63.

{{WITHIN DISTINCT}} has treated the two 15 values for George as one value, but 
has still counted the 15 for Dick separately.

The {{WITHIN DISTINCT}} clause can be useful to prevent double-counting when 
duplicates have been added via a many-to-one join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-4483) Add WITHIN DISTINCT clause for aggregate functions

Reply via email to