suremarc commented on issue #3922: URL: https://github.com/apache/arrow-rs/issues/3922#issuecomment-1482865710
> > Unless I am misunderstanding, I do not think it is possible to select the last N rows subject to a predicate with a RowSelection. > > This is actually possible, see https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ for more background on how predicate pushdown works for parquet I have read this article and am familiar with the RowSelection API. To my understanding, a RowSelection generated by a filter can rule out ranges based on the page statistics but cannot tell you how many matches for a predicate are actually in each page — it can only tell you that a page definitely has zero matches. In the worst case there might only be one match per page that wasn't pruned. So if I wanted to retrieve exactly N rows satisfying my predicate, I would have to include offsets from the last N pages of the column in the RowSelection, which is maximally pessimistic. I apologize if I'm wrong, in which case I probably will look like a fool... nonetheless, I would love to be wrong on this particular issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
