Jinfeng Ni created DRILL-4707:
---------------------------------
Summary: Conflicting columns names under case-insensitive policy
lead to either memory leak or incorrect result
Key: DRILL-4707
URL: https://issues.apache.org/jira/browse/DRILL-4707
Project: Apache Drill
Issue Type: Bug
Reporter: Jinfeng Ni
Priority: Critical
On latest master branch:
{code}
select version, commit_id, commit_message from sys.version;
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
| version | commit_id |
commit_message |
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
| 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: Add
a split function that allows to separate string by a delimiter |
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
{code}
If a query has two conflicting column names under case-insensitive policy,
Drill will either hit memory leak, or incorrect issue.
Q1.
{code}
select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory
leaked: (131072)
Allocator(op:0:0:1:Project) 1000000/131072/2490368/10000000000
(res/actual/peak/limit)
Fragment 0:0
{code}
Q2: return only one column in the result.
{code}
select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
+------+
| XYZ |
+------+
| 0 |
| 1 |
| 1 |
| 1 |
| 4 |
| 0 |
| 3 |
{code}
The cause of the problem seems to be that the Project thinks the two incoming
columns as identical (since Drill adopts case-insensitive for column names in
execution).
The planner should make sure that the conflicting columns are resolved, since
execution is name-based.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)