[
https://issues.apache.org/jira/browse/ARROW-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570853#comment-17570853
]
Vibhatha Lakmal Abeykoon commented on ARROW-17183:
--------------------------------------------------
{quote}
In general, if you want something to run efficiency on some specific
architecture, you can't expect to rely solely upon a generic optimizer with no
knowledge of that specific architecture. So I don't think you can ever avoid
Acero-specific optimizations. I do however think that you can get ~90% of the
Acero-specific optimizations done in the same tree traversal that you need for
the Substrait to Acero conversion anyway, as the more complex grunt work of
pushing filters and projections through joins and such would already have been
done at the Substrait level. So that's what I've been proposing.
{quote}
I don't know much about an effort towards this direction, so in the current
system Acero could be limited in functionality. I don't know if it make sense
to write an optimizer in Acero. IMHO, what would be better is to write the
Acero query and convert it to a Substrait plan, and then optimize this plan
using a third-party optimizer. May be there could be something like
substrait-optimizer in future (I really don't know). And use this optimized
plan to create the Acero plan again. The question is if such optimization is
possible with a third-party optimizer and if it takes lesser time, it would be
ideal, isn't it? Writing Acero-native optimizer itslef could be a separate
project itself. Since there are so much progressed optimizers, can't we use one
to optimize the sub-optimal plan? My knowledge on query optimizing is not very
strong, so I wouldn't argue much about it.
{quote}
I don't think this orthogonality is a bad thing, actually. However, if the goal
is to become a fully-featured query engine, using Substrait or otherwise, you
do at least need to satisfy the expectations that come with it. Anecdotal, but
in every database I've ever queried, doing the same query twice returned the
results in the same order.
{quote}
Yes, if Acero expects to inherit all these core features of the database it
must do what they suppose to do, no argument there. Since Acero is an streaming
execution engine, how far are we reaching for those goals are not yet clear to
me. But at the end of the day, if we are benchmarking our performance with
other systems, it would be the best to support such features as optimized as
possible.
> [C++] Adding ExecNode with Sort and Fetch capability
> ----------------------------------------------------
>
> Key: ARROW-17183
> URL: https://issues.apache.org/jira/browse/ARROW-17183
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Vibhatha Lakmal Abeykoon
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
>
> In Substrait integrations with ACERO, a functionality required is the ability
> to fetch records sorted and unsorted.
> Fetch operation is defined as selecting `K` number of records with an offset.
> For instance pick 10 records skipping the first 5 elements. Here we can
> define this as a Slice operation and records can be easily extracted in a
> sink-node.
> Sort and Fetch operation applies when we need to execute a Fetch operation on
> sorted data. The main issue is we cannot have a sort node followed by a
> fetch. The reason is that all existing node definitions supporting sort are
> based on sink nodes. Since there cannot be a node followed by sink, this
> functionality has to take place in a single node.
> But this is not a perfect solution for fetch and sort, but one way to do this
> is define a sink node where the records are sorted and then a set of items
> are fetched.
> Another dilema is what if sort is followed by a fetch. In that case, there
> has to be a flag to enable the order of the operations.
> The objective of this ticket is to discuss a viable efficient solution and
> include new nodes or a method to execute such a logic.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)