John Omernik created DRILL-4379:
-----------------------------------
Summary: Unexpected Table Behavior with only one subdirectory vs.
Many
Key: DRILL-4379
URL: https://issues.apache.org/jira/browse/DRILL-4379
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.4.0
Reporter: John Omernik
A common practice is to use directories below a main directory as a
partitioning device. Say you have a table named "myawesomedata" and you get
data into that table every day, it would be valuable to create the main
directory, then subdirectories per day to help optimize queries running against
only certain days of data.
/myawesomedata/
/myawesomedata/2016-02-01
/myawesomedata/2016-02-02
/myawesomedata/2016-02-03
/myawesomedata/2016-02-04
I have identified a condition that if there is ONLY one subdirectory, queries
do not return results as expected by a user.
Example:
In the above, if I run a query of
select count(1) from `myawesomedata`;
I get accurate results of the count in all subdirectories
If I run:
select count(1) from `myawesomedata` where dir0 = '2016-02-01';
I get accurate results of the count of only the subdirectory 2016-02-01
However, if I delete subdirectories 2016-02-02, 2016-02-03, and 2016-02-04 and
am left with:
/myawesomedata/
/myawesomedata/2016-02-01
Then if I run
select count(1) from `myawesomedata`;
It returns the accurate count (which is just that of the 2016-02-01 directory).
However, if I run
select count(1) from `myawesomedata` where dir0 = '2016-02-01';
It takes much longer (15 seconds vs instant on the other queries) and returns
no results. Even though this is the same query as above that worked with 2 or
more subdirectories. Basically, when there is only one subdirectory, a query
asking for only that directory does not work in the same way as when there are
more subdirectories. This is an unexpected user experience and something I
believe could cause user frustration and unexpected results from Drill usage on
data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)