There are various open source columnar database engines you could look at to get inspiration for a varargs variant of sort_indices.
On Thu, Sep 3, 2020 at 9:26 AM Ben Kietzman <ben.kietz...@rstudio.com> wrote: > > Hi Rares, > > The arrow API does not currently support sorting against multiple columns. > We'd welcome a JIRA/PR to add that support. > > One potential workaround is storing the tuple as a single column of > fixed_size_list(int32, 2), which could then be viewed [1] as int64 (for > which sorting > is supported). Would that accommodate your use case? > > Ben > > [1]: > https://github.com/apache/arrow/blob/e1e3188/cpp/src/arrow/array/array_base.h#L132-L138 > > On Thu, Sep 3, 2020 at 8:26 AM Rares Vernica <rvern...@gmail.com> wrote: > > > Hello, > > > > I have a set of integer tuples that need to be collected and sorted at a > > coordinator. Here is an example with tuples of length 2: > > > > [(1, 10), > > (1, 15), > > (2, 10), > > (2, 15)] > > > > I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2] > > and [10, 15, 10, 15], and have the Arrow arrays grouped in a Record Batch. > > Then I would serialize, transfer, and deserialize each record batch. The > > coordinator would collect all the record batches and concatenate them. > > Finally, the coordinator needs to sort the tuples by value in the > > sequential order of the columns, e.g., (1, 10), (1, 15), (2, 10). > > > > Could I accomplish the sort using the Arrow API? I looked at sort_indices > > but it does not work on record batches. With a set of sort indices for each > > array, sorting the tuples does not seem to be straightforward, right? > > > > Thanks! > > Rares > >