Aman Sinha created IMPALA-10360:
-----------------------------------

             Summary: Allow a simple limit to be treated as a sampling hint 
where applicable
                 Key: IMPALA-10360
                 URL: https://issues.apache.org/jira/browse/IMPALA-10360
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple 
limit as a way to sample from a table if a relevant hint has been provided.  
This is especially useful if the query is against a view because a TABLESAMPLE 
clause is only supported for base tables, not views.  Here's an example that 
illustrates the motivation:

{noformat}
set optimize_simple_limit = true;
with v1 as 
(select * from fact_table /* +some_hint_for_table */
  where col in (select col from dim_table where ...)) 
select * from v1 limit 10;
{noformat}

In this case, the outer query just wants any 10 rows that satisfy the WHERE 
predicate in v1 and if we can specify the hint for the large fact_table to 
treat the simple limit as a hint for sampling, it would substantially reduce 
the query planning time without significantly compromising on the correctness.  
Without such optimization, during planning the scan ranges will be computed for 
the entire fact_table which is expensive.

Also, note that doing the naive push down of limit to the fact table is not 
advisable because then the planner may decide (under the 
optimize_simple_limit=true setting) to only look at first few partitions or 
files within a partition and those rows may not satisfy the join condition. The 
sampling will be spread out more uniformly across the partitions, so the 
chances of producing sufficient qualifying rows is much higher.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to