jcsherin commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2229152203
When adding trace statements in `nth_value` aggregate I can see that the following are executed in order: - `update_batch()` - `state()` - `merge_batch()` - `evaluate()` 1. __nth_value always exists__ - In the query below the 2nd item in `C13` is always present because for all groups `a..=e` the count in each group is either 18, 19 or 21. ```sql SELECT C1 , COUNT(C1) as n , NTH_VALUE(C13, 2 ORDER BY C1, C13 ASC) as nth -- get 2nd row FROM aggregate_test_100 GROUP BY C1 ORDER BY C1; +----+----+--------------------------------+ | c1 | n | nth | +----+----+--------------------------------+ | a | 21 | Amn2K87Db5Es3dFQO9cw9cvpAM6h35 | | b | 19 | 6FPJlLAcaQ5uokyOWZ9HGdLZObFvOZ | | c | 21 | 6WfVFBVGJSQb7FhA7E0lBwdvjfZnSW | | d | 18 | 1aOcrEGd0cOqZe2I5XBOm0nDcwtBZO | | e | 21 | 3BEOHQsMEFZ58VcNTOJYShTBpAPzbt | +----+----+--------------------------------+ 5 row(s) fetched. ``` 2. __nth_value is sometime out of bounds__ - Since 'b' and 'd' groups have only 18 and 19 values, we can set `N` to be 20. ```sql SELECT C1 , COUNT(C1) as n , NTH_VALUE(C13, 20 ORDER BY C1, C13 ASC) as nth -- get 20th row FROM aggregate_test_100 GROUP BY C1 ORDER BY C1; +----+----+--------------------------------+ | c1 | n | nth | +----+----+--------------------------------+ | a | 21 | waIGbOGl1PM6gnzZ4uuZt4E2yDWRHs | | b | 19 | | | c | 21 | pLk3i59bZwd5KBZrI1FiweYTd5hteG | | d | 18 | | | e | 21 | ukOiFGGFnQJDHFgZxHMpvhD3zybF0M | +----+----+--------------------------------+ 5 row(s) fetched. ``` --- ### Create Table ```sql CREATE EXTERNAL TABLE aggregate_test_100 ( c1 VARCHAR NOT NULL, c2 TINYINT NOT NULL, c3 SMALLINT NOT NULL, c4 SMALLINT, c5 INT, c6 BIGINT NOT NULL, c7 SMALLINT NOT NULL, c8 INT NOT NULL, c9 BIGINT UNSIGNED NOT NULL, c10 VARCHAR NOT NULL, c11 FLOAT NOT NULL, c12 DOUBLE NOT NULL, c13 VARCHAR NOT NULL ) STORED AS CSV -- path is relative to `datafusion-cli` LOCATION '../testing/data/csv/aggregate_test_100.csv' OPTIONS ('format.has_header' 'true'); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org