drin commented on issue #34451:
URL: https://github.com/apache/arrow/issues/34451#issuecomment-1457211006

   > [Sortedness metadata has been discussed on the ML 
before](https://lists.apache.org/thread/xo7337dz73mx5lmfwrfw5bx5lp5mnzcf)
   
   this is super helpful. This is the relevant PR for datafusion: 
https://github.com/apache/arrow-datafusion/pull/1776. @alamb , if you have 
extra input it'd be nice to hear.
   
   
   > As for namespacing, there is some precedent in the 
[spec](https://arrow.apache.org/docs/format/Columnar.html#custom-application-metadata)
   
   I think this addresses the aspect I was most concerned with, though 
namespace reservation hadn't totally occurred to me (awkward if multiple 
projects have conflicting namespaces). Probably in the short term this can be 
assumed to not be problematic (so acero can claim an arbitrary namespace when 
it's reasonable to do so).
   
   
   > In many systems I believe sortedness is often recorded outside of the 
files themselves as part of some catalog or table format
   
   The interesting thing here seems to be that for substrait plans that 
reference a table (e.g. `NamedTable`), perhaps there will/should be an operator 
that actually just merges metadata (perhaps metadata, managed by different 
systems, will be maintained in separate catalogs/locations and want to be 
merged at some point in query execution).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to