westonpace commented on issue #34451: URL: https://github.com/apache/arrow/issues/34451#issuecomment-1457178512
> Not sure if there has been any other proposal of metadata management in arrow that should be leveraged. [Sortedness metadata has been discussed on the ML before](https://lists.apache.org/thread/xo7337dz73mx5lmfwrfw5bx5lp5mnzcf). The reception seemed generally favorable though no proposal was ever put forward. [Similarly with min/max metadata](https://lists.apache.org/thread/r4v876hdprzyqgttdy0ntvoj5dpdnykg). In many systems I believe sortedness is often recorded outside of the files themselves as part of some catalog or table format (e.g. [Iceberg](https://iceberg.apache.org/spec/#sorting)) As for namespacing, there is some precedent in the [spec](https://arrow.apache.org/docs/format/Columnar.html#custom-application-metadata): > The colon symbol : is to be used as a namespace separator. It can be used multiple times in a key. > > The ARROW pattern is a reserved namespace for internal Arrow use in the custom_metadata fields. For example, ARROW:extension:name. I don't think there is a need at the moment for an Acero namespace as the only thing (so far) that Acero would be interested in would be sortedness and min/max statistics and/or index information. All of this should be universally applicable and ideally agreed upon across all implementations and not just in Acero (though we could start there while doing initial work with the hope of making a proposal for the wider community). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
