crepererum opened a new issue, #17172: URL: https://github.com/apache/datafusion/issues/17172
### Is your feature request related to a problem or challenge? Often parquet files are written in time-ascending order, i.e. new values are appended at the end. However many users would like to know the last value(s) of a given context. Currently this means that DataFusion would scan the entire parquet file and just keep the relevant bit at the end. However that seems wasteful and I think we can do better. ### Describe the solution you'd like The solution here overlaps with things that we would need to implement in the arrow-rs repository, but I am using an umbrella ticket to illustrate the rough idea. ## Part 1 -- Order Inversion If a parquet file is order `col1 ASC, col2 DESC, col3 ASC`, then reading it "back to front" should result in data ordered `col1 DESC, col2 ASC, col3 DESC`, i.e. the sequence of expressions is the same, but the polarity (ascending or descending) of each expression is flipped. So the DataFusion planner should (after all of this is implemented) that it is better to inverse the order and potentially apply a top-k on that than reading an entire file and perform a simple `Sort` operation. ## Part 2 -- Decode Parquet Pages in Reverse Order Ideally we would be able to decode the parquet files backwards, i.e. "last value first". However this only works for a subset of [encodings](https://parquet.apache.org/docs/file-format/data-pages/encodings/), while others are clearly order-dependent. But if we look at a parquet file, this order only matters on a page level and technically pages (of a column) can be decoded in reverse order. That is true for ALL encodings. ## Part 3 -- Flip a Page Reading the pages in reverse order however is still not enough. Imagine the following layout: ```yaml pages: - values: - A - B - C - values: - D - E ``` Decoding pages in reverse order would yield `D, E, A, B C`. That's not right and we need to reverse the order within a page. Now my hypothesis is that "flipping" a page would only require inverting the order of an arrow buffer and in contrast to the generic [`take`](https://docs.rs/arrow/latest/arrow/compute/kernels/take/fn.take.html) kernel, this can be implemented in-place and without random access. This highly simplified code for an integer array shows the principle: ```rust fn flip_u64_array(array: &mut [u64]) { // if you do this iterator instead of index-based, // you can even get rid of the bounds-checking for idx in 0..(array.len() / 2) { array.swap(idx, array.len() - idx); } } ``` One can imagine how this even works for more complex types: - dictionary encoded data: flip the array that contains the per-row dict keys, but NOT the actual dict - nested data (like list): flip top-level array, but not the children (i.e. the lists themselves are NOT flipped) - run-offset encoded data: flip both the offset and the value array - string views: flip the offset array but NOT the values A `flip` kernel would even be useful for DataFusion in general (outside the parquet context), i.e. when we know that we can use this optimization instead of a full `sort`. ### Describe alternatives you've considered \- ### Additional context \- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org