Aman Sinha created IMPALA-10314:
-----------------------------------

             Summary: Planning time for simple SELECT with LIMIT could be 
improved
                 Key: IMPALA-10314
                 URL: https://issues.apache.org/jira/browse/IMPALA-10314
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Consider a table t1 with following characteristics:
{noformat}
HDFS, Parquet format, external table
number of partitions in t1 : 39000 (2 level partitioning)
number of column : 72
number of files : 350000
{noformat}

The planning time for the following query with LIMIT without order-by is fairly 
long:
{noformat}
select * from t1 limit 10;

Query Compilation: 4s411ms
   - Single node plan created: 3s812ms (3s259ms)
{noformat}

The bulk of the time is spent in HdfsScanNode.computeScanRangeLocations() which 
iterates over all the partitions and file descriptors within the partitions to 
assign scan ranges based on data affinity.  For trivial LIMIT queries 
especially with small LIMIT values, we should look at ways to improve the 
planning time. 

{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to