I always wished there was a variant of LIMIT that operated per key. E.g. return the youngest 2 employees in each department:
SELECT * FROM Amp ORDER BY deptno LIMIT 2, birthdate DESC (This is not standard syntax.) DISTINCT ON is the special case LIMIT 1. But LIMIT 1 would not mess with SQL evaluation order the way DISTINCT ON does. Also, per-key limit is something that a sort-limit physical operator could usefully and efficiently do. > On Mar 4, 2018, at 8:09 PM, Aman Sinha <[email protected]> wrote: > > So far, I haven't had much success.. other databases such as Oracle, SQL > Server and DB2 don't seem to support this at least based on their published > SQL reference guides. > > One way people work around this is to use ROW_NUMBER() window function to > achieve similar results. > e.g > SELECT a, b, c FROM (SELECT a, b, c , ROW_NUMBER() OVER (PARTITION BY > a ORDER BY <some column>) as rownum FROM t) WHERE rownum = 1; > > Regarding the aggregate functions allowing arbitrary values in a group, I > am not aware of databases that do that. I am not sure how to verify this > though. > > -Aman > > On Thu, Feb 22, 2018 at 10:06 PM, Aman Sinha <[email protected]> wrote: > >> I will try to find out more about this during the next few days....what do >> Oracle, SQL Server, DB2 support. >> >> On Wed, Feb 21, 2018 at 2:12 PM, Julian Hyde <[email protected]> wrote: >> >>> Can you do some research, and see if any other databases do anything >>> similar? Since Postgres isn’t standard, maybe we can improve upon it a bit. >>> >>> Also, do any databases have an aggregate function that takes an arbitrary >>> value in the group, or asserts that all values are the same? We would use >>> that when we translate the syntactic sugar to algebra. >>> >>> Julian >>> >>> >>>> On Feb 21, 2018, at 2:02 PM, Aman Sinha <[email protected]> wrote: >>>> >>>> It is a useful functionality, especially since the SELECT list can >>> contain >>>> mix of primitive types and array or map type columns. >>>> I do see your point about the Postgres semantics of ORDER BY vs DISTINCT >>>> ON. >>>> I don't like the hidden semantics of MySQL either. >>>> Perhaps the compromise solution is to introduce functions such as >>>> FIRST_ROW() aggregation function (similar to lead/lag window functions) >>> and >>>> use GROUP BY. >>>> >>>> -Aman >>>> >>>> On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <[email protected]> wrote: >>>> >>>>> I can see that it would be useful. But there are a couple of things >>> about >>>>> it that are messy. >>>>> >>>>> I think the syntax is a bit clunky because it uses parentheses; a query >>>>> would be difficult to read if people would like multiple columns, >>>>> expressions, and aliases. >>>>> >>>>> I also think the semantics are messy. If you read >>>>> https://www.postgresql.org/docs/9.5/static/sql-select.html# >>> SQL-DISTINCT < >>>>> https://www.postgresql.org/docs/9.5/static/sql-select.html# >>> SQL-DISTINCT> >>>>> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause, >>>>> whereas regular DISTINCT is evaluated before the ORDER BY clause. So >>> there >>>>> will be a bizarre interaction if DISTINCT ON is used with UNION and >>> ORDER >>>>> BY. >>>>> >>>>> By the way, good ol’ MySQL doesn’t have this problem; you can just >>> write >>>>> >>>>> SELECT a, b, c FROM t GROUP BY a >>>>> >>>>> and it just picks the first value of b and c. This “feature” is one of >>> my >>>>> least favorite things about MySQL, so let’s not emulate it. :) >>>>> >>>>> Julian >>>>> >>>>> >>>>>> On Feb 21, 2018, at 12:44 PM, Aman Sinha <[email protected]> >>> wrote: >>>>>> >>>>>> The DISTINCT 'ON' clause is not supported but I am trying to see if >>> there >>>>>> are other people who have run into this. One of the use cases I have >>>>>> intuitively maps to something like this: >>>>>> >>>>>> SELECT DISTINCT *ON (a)*, b, c FROM T >>>>>> >>>>>> Here suppose 'a' is an INT and b, c are some complex types such as >>>>>> array. In my example, the values of b and c happen to be the same in >>>>> all >>>>>> rows belonging to the same group of 'a', so I just want the first row. >>>>>> Since these are arrays, I cannot use MIN(b) GROUP BY a . Other >>>>>> alternative is to create a new aggregate function that picks the first >>>>>> value but it makes the syntax verbose especially with large number of >>>>> such >>>>>> columns. >>>>>> >>>>>> Incidentally, Postgres supports this [1] >>>>>> >>>>>> [1] >>>>>> https://www.postgresql.org/docs/current/static/sql- >>>>> select.html#SQL-DISTINCT >>>>>> >>>>>> >>>>>> -Aman >>>>> >>>>> >>> >>> >>
