alamb opened a new issue, #7354: URL: https://github.com/apache/arrow-datafusion/issues/7354
### Is your feature request related to a problem or challenge? As of now, you can 1. create an external table (implemented by `ListingTable`) that points at a local directory and can data to it which makes new files 2. create an external table (implemented by `ListingTable`) that points at a local directory with a declared sort order and datafusion will take advantage of that order! Sadly you can not do both together -- insert data into external table that has had a sort order declared. For example: ```shell $ mkdir output $ datafusion-cli ``` ```sql DataFusion CLI v29.0.0 ❯ create external table output(time timestamp) stored as parquet location 'output' with order (time); 0 rows in set. Query took 0.002 seconds. ❯ insert into output values (now()); This feature is not implemented: Writing to a sorted listing table via insert into is not supported yet. To write to this table in the meantime, register an equivalent table with file_sort_order = vec![] ``` ### Describe the solution you'd like From @devinjdangelo comments in https://github.com/apache/arrow-datafusion/issues/6569#issuecomment-1683790637 In the case of appending new files to a directory, I think it is as simple as having FileSinkExec require its input be sorted. DataFusion's optimizer should do the rest to ensure the new file is sorted properly. In the case of a single file (`LOCATION 'foo.parquet'` for example), likely can't be handled efficiently as doing so would require reading the existing file, merging that with the new data and rewriting the whole file. ### Describe alternatives you've considered Alternatively, we could have a check to see if 1) the table is sorted and 2) the input to FileSinkExec is sorted. If 1) is true but 2) is not, we would need to update the metadata about the table to indicate for subsequent queries it is no longer guaranteed to be sorted. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
