jcsherin commented on issue #11433:
URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2229152203

   When adding trace statements in `nth_value` aggregate I can see that the 
following are executed in order:
   - `update_batch()`
   - `state()`
   - `merge_batch()`
   - `evaluate()`
   
   1. __nth_value always exists__ - In the query below the 2nd item in `C13` is 
always present because for all groups `a..=e` the count in each group is either 
18, 19 or 21.
   ```sql
   SELECT C1
          , COUNT(C1) as n
          , NTH_VALUE(C13, 2 ORDER BY C1, C13 ASC) as nth -- get 2nd row
       FROM aggregate_test_100
      GROUP BY C1
      ORDER BY C1;
   +----+----+--------------------------------+
   | c1 | n  | nth                            |
   +----+----+--------------------------------+
   | a  | 21 | Amn2K87Db5Es3dFQO9cw9cvpAM6h35 |
   | b  | 19 | 6FPJlLAcaQ5uokyOWZ9HGdLZObFvOZ |
   | c  | 21 | 6WfVFBVGJSQb7FhA7E0lBwdvjfZnSW |
   | d  | 18 | 1aOcrEGd0cOqZe2I5XBOm0nDcwtBZO |
   | e  | 21 | 3BEOHQsMEFZ58VcNTOJYShTBpAPzbt |
   +----+----+--------------------------------+
   5 row(s) fetched.
   
   ```
   2. __nth_value is sometime out of bounds__ - Since 'b' and 'd' groups have 
only 18 and 19 values, we can set `N` to be 20. 
   ```sql
   SELECT C1
          , COUNT(C1) as n
          , NTH_VALUE(C13, 20 ORDER BY C1, C13 ASC) as nth -- get 20th row
       FROM aggregate_test_100
      GROUP BY C1
      ORDER BY C1;
   +----+----+--------------------------------+
   | c1 | n  | nth                            |
   +----+----+--------------------------------+
   | a  | 21 | waIGbOGl1PM6gnzZ4uuZt4E2yDWRHs |
   | b  | 19 |                                |
   | c  | 21 | pLk3i59bZwd5KBZrI1FiweYTd5hteG |
   | d  | 18 |                                |
   | e  | 21 | ukOiFGGFnQJDHFgZxHMpvhD3zybF0M |
   +----+----+--------------------------------+
   5 row(s) fetched.
   
   ```
   ---
   ### Create Table
   
   ```sql
   CREATE EXTERNAL TABLE aggregate_test_100 (
     c1  VARCHAR NOT NULL,
     c2  TINYINT NOT NULL,
     c3  SMALLINT NOT NULL,
     c4  SMALLINT,
     c5  INT,
     c6  BIGINT NOT NULL,
     c7  SMALLINT NOT NULL,
     c8  INT NOT NULL,
     c9  BIGINT UNSIGNED NOT NULL,
     c10 VARCHAR NOT NULL,
     c11 FLOAT NOT NULL,
     c12 DOUBLE NOT NULL,
     c13 VARCHAR NOT NULL
   )
   STORED AS CSV
   -- path is relative to `datafusion-cli`
   LOCATION '../testing/data/csv/aggregate_test_100.csv'
   OPTIONS ('format.has_header' 'true');
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to