Jinfeng Ni created DRILL-2761:
---------------------------------
Summary: ParquetGroupScan copy constructor only copy reference,
leading to out-sync ParquetGroupScan instance.
Key: DRILL-2761
URL: https://issues.apache.org/jira/browse/DRILL-2761
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Jinfeng Ni
ParquetGroupScan has one copy constructor, which will be used in project
pushdown rule and partition pruning rule to clone a modified version of
original ParquetGroupScan instance. However, the copy constructor only copy the
reference to several Collections, this means that if the cloned instance modify
those collections, it will also modify the contents of the collections in the
original ParquetGroupScan instance, leading to an invalid status for the
original ParquetGroupScan instance. Such invalid status would lead incorrect
query result.
For instance, consider query:
{code}
select O_ORDERKEY,O_CUSTKEY,O_CLERK,O_COMMENT,dir0
from `/drill/testdata/partition_pruning/dfs/orders`
where (dir0=1993)
{code}
Assume the data is partitioned with year (1993, 1994, 1995). Depending on the
order of RelOptRule's firing, a ParquetGroupScan could have out-sync of
"rowGroupInfos" list and "entries" list, this will make optimizer thinks that
the partition filter is pushed, such that "entries" is modified and filter is
removed from the plan, yet the "rowGroupInfors" is still in the original one.
This will make the query return unwanted rows back.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)