Vibhatha Lakmal Abeykoon created ARROW-17183:
------------------------------------------------
Summary: [C++] Adding ExecNode with Sort and Fetch capability
Key: ARROW-17183
URL: https://issues.apache.org/jira/browse/ARROW-17183
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon
In Substrait integrations with ACERO, a functionality required is the ability
to fetch records sorted and unsorted.
Fetch operation is defined as selecting `K` number of records with an offset.
For instance pick 10 records skipping the first 5 elements. Here we can define
this as a Slice operation and records can be easily extracted in a sink-node.
Sort and Fetch operation applies when we need to execute a Fetch operation on
sorted data. The main issue is we cannot have a sort node followed by a fetch.
The reason is that all existing node definitions supporting sort are based on
sink nodes. Since there cannot be a node followed by sink, this functionality
has to take place in a single node.
But this is not a perfect solution for fetch and sort, but one way to do this
is define a sink node where the records are sorted and then a set of items are
fetched.
Another dilema is what if sort is followed by a fetch. In that case, there has
to be a flag to enable the order of the operations.
The objective of this ticket is to discuss a viable efficient solution and
include new nodes or a method to execute such a logic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)