zhuqi-lucas commented on issue #19654:
URL: https://github.com/apache/datafusion/issues/19654#issuecomment-3747800779

   > Hey [@xudong963](https://github.com/xudong963) 👋🏻
   > 
   > I am working through it in 
[this](https://github.com/feniljain/datafusion/tree/feat-offset-pushdown) 
branch of my own fork, you can see a diff 
[here](https://github.com/apache/datafusion/compare/main...feniljain:datafusion:feat-offset-pushdown?expand=1)
 to get a better idea of details. To keep it in words, I was able to push down 
`offset` to `TableScan` in logical optimizer. It can also prune files using it 
in `ListingTable` now!
   > 
   > Next, I want to work out how to use remaining offset after file pruning in 
`file_stream` or a similar place where I can do the whole cycle of tracking 
offset and reducing it by file's row count. Second part needs to be done 
regardless cause in case of `filters` being pushed down, we do not take `limit` 
and `offset` into consideration.
   > 
   > I was planning of opening a draft POC PR to get feedback on after 
completing `file_stream` integration and doing some manual tests at the very 
least 😅
   > 
   > I am guessing by design side you mean how to do it in multi-file + filters 
scenario? For using it in pruning in `ListingTable` it didn't seem that hard as 
we get a single flat stream of files, and I could stop skipping using offset 
once the row count is achieved. I am not exactly sure how would it work at 
`DataSource` level yet. If you have any ideas, I would love to hear them out 😄
   
   
   Thanks @feniljain!
   
   I'm particularly interested in offset pushdown **combined with filters** - 
   this could provide huge performance gains for a common pattern.
   
   **Key insight**: When we can prove through row group statistics that a 
filter 
   will NOT filter any rows (e.g., `date >= '1900-01-01'` where all data is 
newer), 
   we can safely use row group row counts to compute offset skipping.
   
   **Our use case**: 
   - Time-sorted data (ORDER BY matches storage order)
   - Filter: `> 2020-01-01` (effectively no-op, all data satisfies this)
   - Offset: 1,000,000
   - Row group size: 65,536
   
   **Current behavior**: Read 1,000,000+ rows, discard them
   
   **Optimized behavior**:
   1. Check each row group's min/max statistics
   2. If `min(date) >= '2020-01-01'`, the entire row group passes the filter
   3. Use row counts to skip ~15 complete row groups (15 × 65,536 = 983,040 
rows)
   4. Only read from row group 16, skip remaining 16,960 rows
   5. Total rows read: ~65,536 instead of 1,000,000+
   
   This is safe because we can **prove** the filter doesn't reduce row counts 
   for those row groups.
   
   What's your opinion about it? Thanks!
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to