comphead commented on code in PR #9419:
URL: https://github.com/apache/arrow-datafusion/pull/9419#discussion_r1510019100
##########
datafusion/sqllogictest/test_files/window.slt:
##########
@@ -4169,36 +4169,96 @@ select lag(a, 2, null) ignore nulls over (order by id
desc) as x1,
sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as sum_id
from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
+# LEAD window function IGNORE/RESPECT NULLS support with ascending order and
default offset 1
+query TTTTTT
+select lead(a) ignore nulls over (order by id) as x,
+ lead(a, 1, null) ignore nulls over (order by id) as x1,
+ lead(a, 1, 'def') ignore nulls over (order by id) as x2,
+ lead(a) respect nulls over (order by id) as x3,
+ lead(a, 1, null) respect nulls over (order by id) as x4,
+ lead(a, 1, 'def') respect nulls over (order by id) as x5
+from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
+----
+b b b b b b
+x x x NULL NULL NULL
+x x x x x x
+NULL NULL def NULL NULL def
+
+# LEAD window function IGNORE/RESPECT NULLS support with descending order and
default offset 1
+query TTTTTT
+select lead(a) ignore nulls over (order by id desc) as x,
+ lead(a, 1, null) ignore nulls over (order by id desc) as x1,
+ lead(a, 1, 'def') ignore nulls over (order by id desc) as x2,
+ lead(a) respect nulls over (order by id desc) as x3,
+ lead(a, 1, null) respect nulls over (order by id desc) as x4,
+ lead(a, 1, 'def') respect nulls over (order by id desc) as x5
+from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
+----
+b b b NULL NULL NULL
+b b b b b b
+NULL NULL def NULL NULL NULL
+NULL NULL def NULL NULL def
+
+# LEAD window function IGNORE/RESPECT NULLS support with ascending order and
nondefault offset
+query TTTT
+select lead(a, 2, null) ignore nulls over (order by id) as x1,
+ lead(a, 2, 'def') ignore nulls over (order by id) as x2,
+ lead(a, 2, null) respect nulls over (order by id) as x4,
+ lead(a, 2, 'def') respect nulls over (order by id) as x5
+from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
+----
+x x NULL NULL
+NULL def x x
+NULL def NULL def
+NULL def NULL def
+
# LEAD window function IGNORE/RESPECT NULLS support with descending order and
nondefault offset
-statement error Execution error: IGNORE NULLS mode for LEAD is not supported
for BoundedWindowAggExec
+query TTTT
select lead(a, 2, null) ignore nulls over (order by id desc) as x1,
lead(a, 2, 'def') ignore nulls over (order by id desc) as x2,
lead(a, 2, null) respect nulls over (order by id desc) as x4,
lead(a, 2, 'def') respect nulls over (order by id desc) as x5
from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
+----
+NULL def b b
+NULL def NULL NULL
+NULL def NULL def
+NULL def NULL def
+
+# LEAD window function IGNORE/RESPECT NULLS support with descending order and
nondefault offset.
+# To trigger WindowAggExec, we added a sum window function with all of the
ranges.
+statement error Execution error: IGNORE NULLS mode for LAG and LEAD is not
supported for WindowAggExec
+select lead(a, 2, null) ignore nulls over (order by id desc) as x1,
+ lead(a, 2, 'def') ignore nulls over (order by id desc) as x2,
+ lead(a, 2, null) respect nulls over (order by id desc) as x4,
+ lead(a, 2, 'def') respect nulls over (order by id desc) as x5,
+ sum(id) over (order by id desc ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as sum_id
+from (select 2 id, 'b' a union all select 1 id, null a union all select 3 id,
null union all select 4 id, 'x')
statement ok
set datafusion.execution.batch_size = 1000;
-query I
-SELECT LAG(c1, 2) IGNORE NULLS OVER()
+query II
+SELECT LAG(c1, 2) IGNORE NULLS OVER(),
+ LEAD(c1, 2) IGNORE NULLS OVER()
FROM null_cases
ORDER BY c2
LIMIT 5;
----
-78
-63
-3
-24
-14
+78 50
+63 38
+3 53
+24 31
+14 94
# result should be same with above, when lag algorithm work with pruned data.
# decreasing batch size, causes data to be produced in smaller chunks at the
source.
# Hence sliding window algorithm is used during calculations.
+# LEAD function to be added, there is some bug with incoming data array when
LEAD is called
Review Comment:
I'll address this in following PR, what I found is the incoming data is
incomplete for LEAD.
so for `set datafusion.execution.batch_size = 8000;`
the incoming array is complete
```
[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
"x",
null,
"b",
null,
]
```
and for `set datafusion.execution.batch_size = 1;`
the data is partially missed
```
[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
"x",
null,
"b"
]
```
Probably the error is in `built_in.rs` I'll check it in following PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]