Re: [Discuss] Storing metadata about the "sortedness" of data

Andy Grove Tue, 11 May 2021 11:01:20 -0700

I had been planning on adding a method to DataFusion's execution plan to
indicate the sort-order of the results (if known), similar to how we
currently have information about output partitioning.


Would this cover your requirement or are you looking for something outside
the context of execution plans?

On Tue, May 11, 2021 at 11:52 AM Andrew Lamb <[email protected]> wrote:

> We are building a system that will likely make heavy use of sorted data,
> and we are trying to figure out how to encode the metadata of "how is this
> data sorted". We can certainly use our own custom metadata fields, but
> wanted to check for prior art and gauge community interest in adding
> something to Arrow. More details are on [1].
>
> Recording sort-order in Schema  would likely be useful for DataFusion as
> well (to optimize away redundant computation if the data is already sorted
> or pick more efficient algorithms (e.g. a MERGING grouping operator).
>
> I didn't see any obvious prior art on the mailing list [2] or in JIRA
> [3][4] so I figured I would ask if others had any backstory or other
> reactions.
>
> Thank you
> Andrew
>
>
>
>
> [1] https://github.com/apache/arrow-rs/issues/284
> [2] https://lists.apache.org/[email protected]:lte=1y:sort
> [3]
>
> https://issues.apache.org/jira/browse/ARROW-12087?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20summary%20~%20sort%20ORDER%20BY%20created%20DESC
> [4]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20description%20~%20sort%20and%20component%20in%20(format)
>

Re: [Discuss] Storing metadata about the "sortedness" of data

Reply via email to