xudong963 opened a new issue, #18860:
URL: https://github.com/apache/datafusion/issues/18860

   Let's see an example:
   ```
   SELECT * FROM t1 WHERE ticker = 'ABT' AND window_start <= 
(1761364799999999999 AT TIME ZONE 'America/New_York') AND window_start >= 
(1698465600000000000 AT TIME ZONE 'America/New_York')  LIMIT 1
   ```
   After the kinds of pruning inside DF, we can get some matched row groups.
   
   Assume that there are still four row groups remaining after row group 
pruning by `ticker = 'ABT' AND window_start <= (1761364799999999999 AT TIME 
ZONE 'America/New_York') AND window_start >= (1698465600000000000 AT TIME ZONE 
'America/New_York')` and related statistics in row group metadata. And one (row 
group 3) of the four is fully matched with the predicates, and the others are 
not, which are only partially matched. 
   
   The worst case is that we need to scan rg1, rg2, then find a matched row in 
rg3. But suppose we know the rg3 is fully matched (currently, DF only supports 
partial-matched). In that case, we can only check it if the total row count 
across all the rg3 meets or exceeds the threshold defined by k in the LIMIT 
clause, then we definitely can reduce the scanned data and skip other rows.
   
   If there are no fully matched rg, we can extend limit pruning to the page 
level, to find the fully matched pages.
   
   ---
   
   I plan to support row group level limit pruning first, then explore the page 
level
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to