[
https://issues.apache.org/jira/browse/DRILL-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinfeng Ni reassigned DRILL-5586:
---------------------------------
Assignee: Jinfeng Ni
> UnionAll operator does more than necessary value vector allocation and copy
> ---------------------------------------------------------------------------
>
> Key: DRILL-5586
> URL: https://issues.apache.org/jira/browse/DRILL-5586
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> When inputs to UnionAll operators are just simple field reference, in stead
> of an expression involving a function, which requires evaluation, it should
> leverage value vector's transfer API. Doing transfer would avoid the
> allocation of buffer for value vector in outgoing batch, plus the overhead to
> copy the data from incoming batch to outgoing batch.
> For example, in the following query:
> {code}
> select l_orderkey from cp.`tpch/lineitem.parquet` l union all select
> n_nationkey from cp.`tpch/nation.parquet`
> {code}
> Both left and right side of UnionAll operator is simple filed reference, and
> Drill should call transfer API. However, the current code would do buffer
> allocation & copy for both left and right. Such processing would
> significantly slow UnionAll operator's performance, and eventually slow down
> query evaluation.
> DRILL-5521 reverts a change in logic whether applying transfer logic made in
> DRILL-5419, based on SchemaPath equal comparison. Even we fix that problem,
> it's not enough to use SchemaPath equal comparison as criteria whether
> transfer should be used. Ideally, even the output field and incoming field
> have different names, UnionAll operator should do {{transfer}}, instead of
> {{copy}}, as long as the expression is simple field reference.
> {code}
> select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select
> n_nationkey as Key2 from cp.`tpch/nation.parquet`
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)