We are building a system that will likely make heavy use of sorted data, and we are trying to figure out how to encode the metadata of "how is this data sorted". We can certainly use our own custom metadata fields, but wanted to check for prior art and gauge community interest in adding something to Arrow. More details are on [1].
Recording sort-order in Schema would likely be useful for DataFusion as well (to optimize away redundant computation if the data is already sorted or pick more efficient algorithms (e.g. a MERGING grouping operator). I didn't see any obvious prior art on the mailing list [2] or in JIRA [3][4] so I figured I would ask if others had any backstory or other reactions. Thank you Andrew [1] https://github.com/apache/arrow-rs/issues/284 [2] https://lists.apache.org/list.html?dev@arrow.apache.org:lte=1y:sort [3] https://issues.apache.org/jira/browse/ARROW-12087?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20summary%20~%20sort%20ORDER%20BY%20created%20DESC [4] https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20description%20~%20sort%20and%20component%20in%20(format)