suremarc commented on issue #3922:
URL: https://github.com/apache/arrow-rs/issues/3922#issuecomment-1482865710

   > > Unless I am misunderstanding, I do not think it is possible to select 
the last N rows subject to a predicate with a RowSelection.
   > 
   > This is actually possible, see 
https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/
 for more background on how predicate pushdown works for parquet
   
   I have read this article and am familiar with the RowSelection API. To my 
understanding, a RowSelection generated by a filter can rule out ranges based 
on the page statistics but cannot tell you how many matches for a predicate are 
actually in each page — it can only tell you that a page definitely has zero 
matches. In the worst case there might only be one match per page that wasn't 
pruned.  So if I wanted to retrieve exactly N rows satisfying my predicate, I 
would have to include offsets from the last N pages of the column in the 
RowSelection, which is maximally pessimistic. 
   
   I apologize if I'm wrong, in which case I probably will look like a fool... 
nonetheless, I would love to be wrong on this particular issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to