westonpace commented on issue #34451:
URL: https://github.com/apache/arrow/issues/34451#issuecomment-1457178512

   > Not sure if there has been any other proposal of metadata management in 
arrow that should be leveraged.
   
   [Sortedness metadata has been discussed on the ML 
before](https://lists.apache.org/thread/xo7337dz73mx5lmfwrfw5bx5lp5mnzcf).  The 
reception seemed generally favorable though no proposal was ever put forward.  
[Similarly with min/max 
metadata](https://lists.apache.org/thread/r4v876hdprzyqgttdy0ntvoj5dpdnykg).
   
   In many systems I believe sortedness is often recorded outside of the files 
themselves as part of some catalog or table format (e.g. 
[Iceberg](https://iceberg.apache.org/spec/#sorting))
   
   As for namespacing, there is some precedent in the 
[spec](https://arrow.apache.org/docs/format/Columnar.html#custom-application-metadata):
   
   > The colon symbol : is to be used as a namespace separator. It can be used 
multiple times in a key.
   > 
   > The ARROW pattern is a reserved namespace for internal Arrow use in the 
custom_metadata fields. For example, ARROW:extension:name.
   
   I don't think there is a need at the moment for an Acero namespace as the 
only thing (so far) that Acero would be interested in would be sortedness and 
min/max statistics and/or index information.  All of this should be universally 
applicable and ideally agreed upon across all implementations and not just in 
Acero (though we could start there while doing initial work with the hope of 
making a proposal for the wider community).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to