Aman Sinha created IMPALA-10314:
-----------------------------------
Summary: Planning time for simple SELECT with LIMIT could be
improved
Key: IMPALA-10314
URL: https://issues.apache.org/jira/browse/IMPALA-10314
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
Consider a table t1 with following characteristics:
{noformat}
HDFS, Parquet format, external table
number of partitions in t1 : 39000 (2 level partitioning)
number of column : 72
number of files : 350000
{noformat}
The planning time for the following query with LIMIT without order-by is fairly
long:
{noformat}
select * from t1 limit 10;
Query Compilation: 4s411ms
- Single node plan created: 3s812ms (3s259ms)
{noformat}
The bulk of the time is spent in HdfsScanNode.computeScanRangeLocations() which
iterates over all the partitions and file descriptors within the partitions to
assign scan ranges based on data affinity. For trivial LIMIT queries
especially with small LIMIT values, we should look at ways to improve the
planning time.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)