Re: [PR] Support IGNORE NULLS for LAG window function [arrow-datafusion]

via GitHub Tue, 20 Feb 2024 22:37:14 -0800


mustafasrepo commented on PR #9221:
URL: 
https://github.com/apache/arrow-datafusion/pull/9221#issuecomment-1955987551


   > The only thing concerns me when I run tests I didn't see `.evaluate_all` 
has been called
   
   `evaluate_all` is called from `WindowAggExec`. `WindowAggExec` works when 
one of the window frame boundaries include `UNBOUNDED FOLLOWING`. Hence for the 
query below
   ```sql
   SELECT LAG(c9, 2) OVER(ORDER BY c9), SUM(c9) OVER(order by c9 ROWS BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
   FROM aggregate_test_100;
   ```
   `WindowAggExec` will be used. Unfortunately, from a SQL query I couldn't 
come up with a simpler reproducer.
   
   
   >  One idea I had is to reverse input array before calling evaluate for only 
LEAD function and it potentially should work, but there reversing might be 
expensive
   Another approach might be for the table, we can construct a vector with same 
length to track non null count. Such as for the table below
   
   a |
   -- |
   1 
   2
   null
   3
   null
   4
   
   we would construct vector
   non_null_count |
   -- |
   1 
   2
   2
   3
   3
   4
   
   where count is incremented each time a new non-null value is seen.
   `LAG`, `LEAD` can work on this `vector` to determine the place of the value. 
For instance, for the `row index 5` in the original table (where `a=4`) 
   `LAG(2)` should produce `2`. We can determine this by finding 
`non_null_count` of the row (which is 4 in our case). When we determine `LAG` 
on this table we can determine that `non_null_count` of the result should be 
`2`. Then we can find the index of first `2` in the `non_null_count` vector. 
(Other `2`s point to `null` values.). Index would be `1` which has value `a=2` 
in the table. 
   `LEAD` can work similarly. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support IGNORE NULLS for LAG window function [arrow-datafusion]

Reply via email to