drin commented on issue #34451: URL: https://github.com/apache/arrow/issues/34451#issuecomment-1457211006
> [Sortedness metadata has been discussed on the ML before](https://lists.apache.org/thread/xo7337dz73mx5lmfwrfw5bx5lp5mnzcf) this is super helpful. This is the relevant PR for datafusion: https://github.com/apache/arrow-datafusion/pull/1776. @alamb , if you have extra input it'd be nice to hear. > As for namespacing, there is some precedent in the [spec](https://arrow.apache.org/docs/format/Columnar.html#custom-application-metadata) I think this addresses the aspect I was most concerned with, though namespace reservation hadn't totally occurred to me (awkward if multiple projects have conflicting namespaces). Probably in the short term this can be assumed to not be problematic (so acero can claim an arbitrary namespace when it's reasonable to do so). > In many systems I believe sortedness is often recorded outside of the files themselves as part of some catalog or table format The interesting thing here seems to be that for substrait plans that reference a table (e.g. `NamedTable`), perhaps there will/should be an operator that actually just merges metadata (perhaps metadata, managed by different systems, will be maintained in separate catalogs/locations and want to be merged at some point in query execution). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
