alamb opened a new issue, #12423:
URL: https://github.com/apache/datafusion/issues/12423

   ### Describe the bug
   
   https://github.com/apache/datafusion/pull/12003 introduced a bug where 
sometimes the `OFFSET` clause is accidentally removed
   
   I am filing this ticket to track the regression / know it exists
   
   ### To Reproduce
   
   @wiedld  created a reproducer here 
https://github.com/apache/datafusion/pull/12399/commits/b6fd751b9d5b2168740c622041d00c6b54eadbfb
   
   Add this to the `order.slt` file:
   
   ```sql
   # all results
   query II
   SELECT b, sum(a) FROM ordered_table GROUP BY b order by b desc;
   ----
   3 25
   2 25
   1 0
   0 0
   
   # limit only
   query II
   SELECT b, sum(a) FROM ordered_table GROUP BY b order by b desc LIMIT 3;
   ----
   3 25
   2 25
   1 0
   
   # offset only
   query II
   SELECT b, sum(a) FROM ordered_table GROUP BY b order by b desc OFFSET 1;
   ----
   2 25
   1 0
   0 0
   
   # TODO: fix this to properly apply offset
   # offset + limit
   query II
   SELECT b, sum(a) FROM ordered_table GROUP BY b order by b desc OFFSET 1 
LIMIT 2;
   ----
   3 25
   2 25
   1 0
   
   # TODO: fix this to not remove the skip=1 during the limit pushdown
   # Applying offset & limit when multiple streams from groupby
   query TT
   EXPLAIN SELECT b, sum(a) FROM ordered_table GROUP BY b order by b desc 
OFFSET 1 LIMIT 2;
   ----
   logical_plan
   01)Limit: skip=1, fetch=2
   02)--Sort: ordered_table.b DESC NULLS FIRST, fetch=3
   03)----Aggregate: groupBy=[[ordered_table.b]], 
aggr=[[sum(CAST(ordered_table.a AS Int64))]]
   04)------TableScan: ordered_table projection=[a, b]
   physical_plan
   01)SortPreservingMergeExec: [b@0 DESC], fetch=3
   02)--SortExec: TopK(fetch=3), expr=[b@0 DESC], preserve_partitioning=[true]
   03)----AggregateExec: mode=FinalPartitioned, gby=[b@0 as b], 
aggr=[sum(ordered_table.a)]
   04)------CoalesceBatchesExec: target_batch_size=8192
   05)--------RepartitionExec: partitioning=Hash([b@0], 2), input_partitions=2
   06)----------AggregateExec: mode=Partial, gby=[b@1 as b], 
aggr=[sum(ordered_table.a)]
   07)------------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1
   08)--------------CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[a, b], 
has_header=true
   
   # TODO: fix this to not remove the skip=4 during the limit pushdown
   # Applying offset & limit when multiple streams from union
   query TT
   explain select * FROM (
     select c FROM ordered_table
     UNION ALL
     select d FROM ordered_table
   ) order by 1 desc LIMIT 10 OFFSET 4;
   ----
   logical_plan
   01)Limit: skip=4, fetch=10
   02)--Sort: ordered_table.c DESC NULLS FIRST, fetch=14
   03)----Union
   04)------Projection: CAST(ordered_table.c AS Int64) AS c
   05)--------TableScan: ordered_table projection=[c]
   06)------Projection: CAST(ordered_table.d AS Int64) AS c
   07)--------TableScan: ordered_table projection=[d]
   physical_plan
   01)SortPreservingMergeExec: [c@0 DESC], fetch=14
   02)--UnionExec
   03)----SortExec: TopK(fetch=14), expr=[c@0 DESC], 
preserve_partitioning=[true]
   04)------ProjectionExec: expr=[CAST(c@0 AS Int64) as c]
   05)--------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1
   06)----------CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[c], 
output_ordering=[c@0 ASC NULLS LAST], has_header=true
   07)----SortExec: TopK(fetch=14), expr=[c@0 DESC], 
preserve_partitioning=[true]
   08)------ProjectionExec: expr=[CAST(d@0 AS Int64) as c]
   09)--------RepartitionExec: partitioning=RoundRobinBatch(2), 
input_partitions=1
   10)----------CsvExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/core/tests/data/window_2.csv]]}, projection=[d], 
has_header=true
   
   # ApplyingmLIMIT & OFFSET to subquery.
   query III
   select t1.b, c, c2 FROM (
     select b, c FROM ordered_table ORDER BY b desc, c desc OFFSET 1 LIMIT 4
   ) as t1 INNER JOIN (
     select b, c as c2 FROM ordered_table ORDER BY b desc, d desc OFFSET 1 
LIMIT 4
   ) as t2
   ON t1.b = t2.b
   ORDER BY t1.b desc, c desc, c2 desc;
   ----
   3 98 96
   3 98 89
   3 98 82
   3 98 79
   3 97 96
   3 97 89
   3 97 82
   3 97 79
   3 96 96
   3 96 89
   3 96 82
   3 96 79
   3 95 96
   3 95 89
   3 95 82
   3 95 79
   
   # TODO: fix this does not correctly work.
   # Apply OFFSET & LIMIT to both parent and child (subquery).
   query III
   select t1.b, c, c2 FROM (
     select b, c FROM ordered_table ORDER BY b desc, c desc OFFSET 1 LIMIT 4
   ) as t1 INNER JOIN (
     select b, c as c2 FROM ordered_table ORDER BY b desc, d desc OFFSET 1 
LIMIT 4
   ) as t2
   ON t1.b = t2.b
   ORDER BY t1.b desc, c desc, c2 desc
   OFFSET 3 LIMIT 2;
   ----
   3 99 96
   3 99 89
   3 99 87
   3 99 82
   3 99 79
   ```
   
   ### Expected behavior
   
   Query should not lose the offset
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to