[
https://issues.apache.org/jira/browse/IMPALA-7942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654386#comment-17654386
]
ASF subversion and git services commented on IMPALA-7942:
---------------------------------------------------------
Commit b296567a32c8f678549fe7e40ea87d7669f81a9e in impala's branch
refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b296567a3 ]
IMPALA-7942 (part 1): Add query hints for table cardinalities
Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:
* select col from t /* +TABLE_NUM_ROWS(1000) */;
If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.
Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases
Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Reviewed-on: http://gerrit.cloudera.org:8080/18829
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Add query hints for cardinalities and selectivities
> ---------------------------------------------------
>
> Key: IMPALA-7942
> URL: https://issues.apache.org/jira/browse/IMPALA-7942
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 3.2.0
> Reporter: Lars Volker
> Assignee: Sheng Wang
> Priority: Major
>
> The optimizer can pick suboptimal plans when tables don't have statistics. To
> allow users to help the optimizer, we should support query hints to specify
> cardinalities of scans, predicated (and possibly joins).
> This could look like the following example.
> {code:sql}
> select x from medium /*+ num_rows(1000000000) */
> join small /*+ num_rows(1000000) */
> join (select * from big /*+ num_rows(1000000000) */
> where c1 < 10 /*+ selectivity(0.00001) */) as big
> where medium.id = small.id and small.id = big.id;
> {code}
> Instead of cardinalities we could also support specifying the number of rows
> that pass a predicate (or join).
> We should not rely on the specified cardinalities to be accurate, e.g. the
> following should still execute a scan:
> {code:sql}
> select count(*) from T /*+ num_rows(100) */
> where id < 100 /*+ selectivity(0.1) */;
> {code}
> This is a first step towards giving users more control over the planner /
> optimizer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]