Aman Sinha created IMPALA-10360:
-----------------------------------
Summary: Allow a simple limit to be treated as a sampling hint
where applicable
Key: IMPALA-10360
URL: https://issues.apache.org/jira/browse/IMPALA-10360
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple
limit as a way to sample from a table if a relevant hint has been provided.
This is especially useful if the query is against a view because a TABLESAMPLE
clause is only supported for base tables, not views. Here's an example that
illustrates the motivation:
{noformat}
set optimize_simple_limit = true;
with v1 as
(select * from fact_table /* +some_hint_for_table */
where col in (select col from dim_table where ...))
select * from v1 limit 10;
{noformat}
In this case, the outer query just wants any 10 rows that satisfy the WHERE
predicate in v1 and if we can specify the hint for the large fact_table to
treat the simple limit as a hint for sampling, it would substantially reduce
the query planning time without significantly compromising on the correctness.
Without such optimization, during planning the scan ranges will be computed for
the entire fact_table which is expensive.
Also, note that doing the naive push down of limit to the fact table is not
advisable because then the planner may decide (under the
optimize_simple_limit=true setting) to only look at first few partitions or
files within a partition and those rows may not satisfy the join condition. The
sampling will be spread out more uniformly across the partitions, so the
chances of producing sufficient qualifying rows is much higher.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]