[
https://issues.apache.org/jira/browse/IMPALA-12983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835913#comment-17835913
]
Wenzhe Zhou commented on IMPALA-12983:
--------------------------------------
Comments from Abhishek:
"Description is a bit misleading, if it's a single Table SCAN with order by
then yeah we probably have to push down order by:
select * from table where <scan predicates> order by <>;
But, if it's a multiple table scan or even for single table scan if there are
aggregates then order by is typically the last operator which gets executed. So
this likely means we will also have to push down hash joins, AGGs, OLAP and
other functions before we can push down ORDER BY.
We could look into improving performance for IMPALA-IMPALA federation as there
we should be able to push down fully remote plans."
Looked at the query plans generated for 22 TPCH queries. Only one plan generate
TopN node and there are some other nodes between TopN node and
DataSourceScanNode. There are lots of restrictions to push down "order by" and
"limit", and we may not get too much benefit for TPCH/TPCDS queries from
pushing down TopN.
Looked at the code SingleNodePlanner.checkAndApplyLimitPushdown(). It's not
simple to extend this function for DataSourceScanNode.
> Performance improvement for impala-impala federation
> ----------------------------------------------------
>
> Key: IMPALA-12983
> URL: https://issues.apache.org/jira/browse/IMPALA-12983
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Wenzhe Zhou
> Assignee: Pranav Yogi Lodha
> Priority: Major
>
> "order by" cannot be pushed down to JDBC right now, but most of tpcds/tpch
> queries are using "order by ... limit ...", e.g top n. This results JDBC
> handler to retrieve all rows of remote table on remote database server, hence
> bad performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]