alamb opened a new issue, #4169: URL: https://github.com/apache/arrow-datafusion/issues/4169
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** IOx stores parquet files in a particular sort order, and then uses the fact the data is sorted for a variety of sort related optimizations The new `BasicEnforcement` rule added in https://github.com/apache/arrow-datafusion/pull/4122 by @mingmwang is (correctly) deciding that since the `ParquetExec` declares its output is not sorted, it needs to add a `SortExec` which is unnecessary in our case and will slow performance dramatically. I think the way to avoid this is to teach DataFusion that the `ParquetExec` is actually sorted (which is is) and then everything will work out. **Describe the solution you'd like** I would like a way for someone constructing a `ParquetExec` manually to be able to specify that the data is already sorted. **Describe alternatives you've considered** It might be possible to figure out the sort order of the data given the parquet metadata, but I haven't looked into that carefully **Additional context** As a bonus, I think at least some part of our plan construction logic in IOx that adds SortExec's in to sort the data could potentially be removed as it is now covered by the DataFusion optimizer. See more detail at https://github.com/influxdata/influxdb_iox/pull/6108#discussion_r1019387151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
