[ 
https://issues.apache.org/jira/browse/IMPALA-7942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654386#comment-17654386
 ] 

ASF subversion and git services commented on IMPALA-7942:
---------------------------------------------------------

Commit b296567a32c8f678549fe7e40ea87d7669f81a9e in impala's branch 
refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b296567a3 ]

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Reviewed-on: http://gerrit.cloudera.org:8080/18829
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add query hints for cardinalities and selectivities
> ---------------------------------------------------
>
>                 Key: IMPALA-7942
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7942
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 3.2.0
>            Reporter: Lars Volker
>            Assignee: Sheng Wang
>            Priority: Major
>
> The optimizer can pick suboptimal plans when tables don't have statistics. To 
> allow users to help the optimizer, we should support query hints to specify 
> cardinalities of scans, predicated (and possibly joins).
> This could look like the following example.
> {code:sql}
> select x from medium /*+ num_rows(1000000000) */
>   join small /*+ num_rows(1000000) */
>   join (select * from big /*+ num_rows(1000000000) */
>         where c1 < 10 /*+ selectivity(0.00001) */) as big
>   where medium.id = small.id and small.id = big.id;
> {code}
> Instead of cardinalities we could also support specifying the number of rows 
> that pass a predicate (or join).
> We should not rely on the specified cardinalities to be accurate, e.g. the 
> following should still execute a scan:
> {code:sql}
> select count(*) from T /*+ num_rows(100) */
>   where id < 100 /*+ selectivity(0.1) */;
> {code}
> This is a first step towards giving users more control over the planner / 
> optimizer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to