suremarc commented on issue #17172: URL: https://github.com/apache/datafusion/issues/17172#issuecomment-3211643923
> [@crepererum](https://github.com/crepererum) Thank you for bringing up the idea. > > My colleague [@suremarc](https://github.com/suremarc) has written a related issue in the arrow-rs repo: [apache/arrow-rs#3922](https://github.com/apache/arrow-rs/issues/3922). And currently, we have an implementation for this. > > Looking forward to collaborating. Yes, this was a pretty gnarly issue and we ended up writing a reverse parquet reader that reads entire row groups into memory one-by-one and reverses each one in memory (using the Arrow `take` kernel as mentioned in this thread). Then we have a `ReverseOrder` optimizer that runs before `EnforceSorting` that looks for opportunities to reverse a Parquet scan if doing so would eliminate sorts. (On that note, it would be nice if DataFusion execution plans supported sort pushdown, then we wouldn't have to implement a custom optimizer.) Reversing entire row groups feels like a bad solution in general because the row groups can be extremely large depending on how the parquet file is written. Decoding page by page would be a great improvement, but [apache/arrow-rs#3922](https://github.com/apache/arrow-rs/issues/3922) calls out some practical difficulties with implementing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org