[ 
https://issues.apache.org/jira/browse/ARROW-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570853#comment-17570853
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-17183:
--------------------------------------------------

{quote}
In general, if you want something to run efficiency on some specific 
architecture, you can't expect to rely solely upon a generic optimizer with no 
knowledge of that specific architecture. So I don't think you can ever avoid 
Acero-specific optimizations. I do however think that you can get ~90% of the 
Acero-specific optimizations done in the same tree traversal that you need for 
the Substrait to Acero conversion anyway, as the more complex grunt work of 
pushing filters and projections through joins and such would already have been 
done at the Substrait level. So that's what I've been proposing.
{quote}

I don't know much about an effort towards this direction, so in the current 
system Acero could be limited in functionality. I don't know if it make sense 
to write an optimizer in Acero. IMHO, what would be better is to write the 
Acero query and convert it to a Substrait plan, and then optimize this plan 
using a third-party optimizer. May be there could be something like 
substrait-optimizer in future (I really don't know). And use this optimized 
plan to create the Acero plan again. The question is if such optimization is 
possible with a third-party optimizer and if it takes lesser time, it would be 
ideal, isn't it? Writing Acero-native optimizer itslef could be a separate 
project itself. Since there are so much progressed optimizers, can't we use one 
to optimize the sub-optimal plan?  My knowledge on query optimizing is not very 
strong, so I wouldn't argue much about it. 

{quote}
I don't think this orthogonality is a bad thing, actually. However, if the goal 
is to become a fully-featured query engine, using Substrait or otherwise, you 
do at least need to satisfy the expectations that come with it. Anecdotal, but 
in every database I've ever queried, doing the same query twice returned the 
results in the same order.
{quote}

Yes, if Acero expects to inherit all these core features of the database it 
must do what they suppose to do, no argument there. Since Acero is an streaming 
execution engine, how far are we reaching for those goals are not yet clear to 
me. But at the end of the day, if we are benchmarking our performance with 
other systems, it would be the best to support such features as optimized as 
possible.

> [C++] Adding ExecNode with Sort and Fetch capability
> ----------------------------------------------------
>
>                 Key: ARROW-17183
>                 URL: https://issues.apache.org/jira/browse/ARROW-17183
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> In Substrait integrations with ACERO, a functionality required is the ability 
> to fetch records sorted and unsorted.
> Fetch operation is defined as selecting `K` number of records with an offset. 
> For instance pick 10 records skipping the first 5 elements. Here we can 
> define this as a Slice operation and records can be easily extracted in a 
> sink-node. 
> Sort and Fetch operation applies when we need to execute a Fetch operation on 
> sorted data. The main issue is we cannot have a sort node followed by a 
> fetch. The reason is that all existing node definitions supporting sort are 
> based on sink nodes. Since there cannot be a node followed by sink, this 
> functionality has to take place in a single node. 
> But this is not a perfect solution for fetch and sort, but one way to do this 
> is define a sink node where the records are sorted and then a set of items 
> are fetched. 
> Another dilema is what if sort is followed by a fetch. In that case, there 
> has to be a flag to enable the order of the operations. 
> The objective of this ticket is to discuss a viable efficient solution and 
> include new nodes or a method to execute such a logic.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to