Jinfeng Ni created DRILL-5586:
---------------------------------
Summary: UnionAll operator does more than necessary value vector
allocation and copy
Key: DRILL-5586
URL: https://issues.apache.org/jira/browse/DRILL-5586
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
When inputs to UnionAll operators are just simple field reference, in stead of
an expression involving a function, which requires evaluation, it should
leverage value vector's transfer API. Doing transfer would avoid the
allocation of buffer for value vector in outgoing batch, plus the overhead to
copy the data from incoming batch to outgoing batch.
For example, in the following query:
{code}
select l_orderkey from cp.`tpch/lineitem.parquet` l union all select
n_nationkey from cp.`tpch/nation.parquet`
{code}
Both left and right side of UnionAll operator is simple filed reference, and
Drill should call transfer API. However, the current code would do buffer
allocation & copy for both left and right. Such processing would significantly
slow UnionAll operator's performance, and eventually slow down query evaluation.
DRILL-5521 reverts a change in logic whether applying transfer logic made in
DRILL-5419, based on SchemaPath equal comparison. Even we fix that problem,
it's not enough to use SchemaPath equal comparison as criteria whether transfer
should be used. Ideally, even the output field and incoming field have
different names, UnionAll operator should do {{transfer}}, instead of {{copy}},
as long as the expression is simple field reference.
{code}
select l_orderkey as Key1 from cp.`tpch/lineitem.parquet` l union all select
n_nationkey as Key2 from cp.`tpch/nation.parquet`
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)