[ 
https://issues.apache.org/jira/browse/IMPALA-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253310#comment-17253310
 ] 

ASF subversion and git services commented on IMPALA-10317:
----------------------------------------------------------

Commit 4099a606892c377b9e8c9c6df2a45a7d42afcaea in impala's branch 
refs/heads/master from Fucun Chu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4099a60 ]

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Add query option that limits join #rows at runtime
> --------------------------------------------------
>
>                 Key: IMPALA-10317
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10317
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>            Reporter: Fucun Chu
>            Assignee: Fucun Chu
>            Priority: Major
>         Attachments: query82_summary.png
>
>
> Reject queries that rows produced too bigger by join operator when executing 
> the query.
> This is a mechanism to protect the cluster from potentially harmful queries.
> When the cardinality of the table is very large and the join conditions are 
> very bad, the number of rows produced by the join will be very large, 
> sometimes tens of billions, which affects the cluster status and other 
> running queries.
> In our environment, the NUM_JOIN_ROWS_PRODUCED_LIMIT query option is added to 
> limit the number of rows produced by a single join operator.
> Implementation refers to 
> [IMPALA-6034|https://issues.apache.org/jira/browse/IMPALA-6034] and summary 
> (see the figure below), check the join operator #rows size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to