[ 
https://issues.apache.org/jira/browse/IMPALA-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881493#comment-17881493
 ] 

Xuebin Su edited comment on IMPALA-12648 at 9/13/24 9:30 AM:
-------------------------------------------------------------

Hi! I made a quick prototype of the kill function:
{code:sql}
[localhost:21051] default> select kill('9941e24eefb3410f:a41fa8f400000000');
+-------------------------------------------+
| kill('9941e24eefb3410f:a41fa8f400000000') |
+-------------------------------------------+
| true                                      |
+-------------------------------------------+
Fetched 1 row(s) in 0.15s
{code}
It takes a query ID as parameter, and returns whether the cancellation 
succeeded.

The session that was running the cancelled query then received the following 
error message:
{code:sql}
ERROR: Query 9941e24eefb3410f:a41fa8f400000000 failed:
Cancelled
{code}
However, this only works for queries whose coordinator is the impalad on which 
the kill function gets called. Otherwise, the cancellation operation will fail 
with error message like:
{code:java}
Invalid or unknown query handle
{code}
and the kill function will return false.

To overcome this limitation, it seems that we can rely on the 
sys.impala_query_live table.

If I understand correctly, when scanning the sys.impala_query_live table, each 
impalad only returns the infomation of queries that it coordinates, as 
indicated by the query plan:
{code:sql}
[localhost:21051] default> explain select query_id from sys.impala_query_live;
+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.00MB Threads=2                    
     |
| Per-Host Resource Estimates: Memory=10MB                                      
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| sys.impala_query_live                                                         
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 01:EXCHANGE [UNPARTITIONED]                                                   
     |
| |                                                                             
     |
| 00:SCAN SYSTEM_TABLE [sys.impala_query_live]                                  
     |
|    row-size=12B cardinality=1                                                 
     |
+------------------------------------------------------------------------------------+
 {code}
Therefore, we can kill a query if we can call the kill function on the 
cooresponding impalad, i.e., before the tuple of the query infomation goes to 
the EXCHANGE node.

Given that expressions like kill('9941e24eefb3410f:a41fa8f400000000') are 
lazily evaluated, we can achieve this by adding a SORT node before the EXCHANGE 
node. This means adding an ORDER BY clause when querying the 
sys.impala_query_live table.

For example, if we want to kill queries whose state are 'RUNNING', we can run 
the following query:
{code:sql}
[localhost:21051] default> select query_id, kill(query_id) as killed
from sys.impala_query_live
where query_state = 'RUNNING'
order by killed;
+-----------------------------------+--------+
| query_id                          | killed |
+-----------------------------------+--------+
| 9f4c2f6859234cef:22ba32af00000000 | true   |
+-----------------------------------+--------+
Fetched 1 row(s) in 0.10s {code}
We can replace the predicate in the where clause with any other predicate to 
kill whatever queries we want and don't need to change other parts.

In this way, the kill function gets called before the tuples of the query 
infomation leave the cooresponding impalads becuase the tuples need to be 
sorted by the result of the kill function first.

As a result, if I understand correctly, this query can kill queries coordinated 
by any impalad.

What do you think? Thanks!


was (Author: JIRAUSER305834):
Hi! I made a quick prototype of the kill function:
{code:sql}
[localhost:21051] default> select kill('9941e24eefb3410f:a41fa8f400000000');
+-------------------------------------------+
| kill('9941e24eefb3410f:a41fa8f400000000') |
+-------------------------------------------+
| true                                      |
+-------------------------------------------+
Fetched 1 row(s) in 0.15s
{code}
It takes a query ID as parameter, and returns whether the cancellation 
succeeded.

The session that was running the cancelled query then received the following 
error message:
{code:sql}
ERROR: Query 9941e24eefb3410f:a41fa8f400000000 failed:
Cancelled
{code}
However, this only works for queries whose coordinator is the impalad on which 
the kill function gets called. Otherwise, the cancellation operation will fail 
with the following error message:
{code:java}
Invalid or unknown query handle: 2648fbd9cb2fe338:26a5b59700000000.
{code}
and the kill function will return false.

To overcome this limitation, it seems that we can rely on the 
sys.impala_query_live table.

If I understand correctly, when scanning the sys.impala_query_live table, each 
impalad only returns the infomation of queries that it coordinates, as 
indicated by the query plan:
{code:sql}
[localhost:21051] default> explain select query_id from sys.impala_query_live;
+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.00MB Threads=2                    
     |
| Per-Host Resource Estimates: Memory=10MB                                      
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| sys.impala_query_live                                                         
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 01:EXCHANGE [UNPARTITIONED]                                                   
     |
| |                                                                             
     |
| 00:SCAN SYSTEM_TABLE [sys.impala_query_live]                                  
     |
|    row-size=12B cardinality=1                                                 
     |
+------------------------------------------------------------------------------------+
 {code}
Therefore, we can kill a query if we can call the kill function on the 
cooresponding impalad, i.e., before the tuple of the query infomation goes to 
the EXCHANGE node.

Given that expressions like kill('9941e24eefb3410f:a41fa8f400000000') are 
lazily evaluated, we can achieve this by adding a SORT node before the EXCHANGE 
node. This means adding an ORDER BY clause when querying the 
sys.impala_query_live table.

For example, if we want to kill queries whose state are 'RUNNING', we can run 
the following query:
{code:sql}
[localhost:21051] default> select query_id, kill(query_id) as killed
from sys.impala_query_live
where query_state = 'RUNNING'
order by killed;
+-----------------------------------+--------+
| query_id                          | killed |
+-----------------------------------+--------+
| 9f4c2f6859234cef:22ba32af00000000 | true   |
+-----------------------------------+--------+
Fetched 1 row(s) in 0.10s {code}
We can replace the predicate in the where clause with any other predicate to 
kill whatever queries we want and don't need to change other parts.

In this way, the kill function gets called before the tuples of the query 
infomation leave the cooresponding impalads becuase the tuples need to be 
sorted by the result of the kill function first.

As a result, if I understand correctly, this query can kill queries coordinated 
by any impalad.

What do you think? Thanks!

> Support killing queries and sessions programatically
> ----------------------------------------------------
>
>                 Key: IMPALA-12648
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12648
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Manish Maheshwari
>            Assignee: Xuebin Su
>            Priority: Major
>
> Support killing queries and sessions programatically via kill commands to be 
> able to manage impala running workloads.
> 1. Killing Queries that are currently running
> {code:java}
> -- Forcibly terminates query with the specified query_id:
> KILL QUERY WHERE query_id='634bf9fcf55278eb:ac0ef05300000000' {code}
> For queries that are the finished and waiting to be closed, this command 
> should close them
> 2. Killing sessions that are open
> {code:java}
> -- Forcibly terminates session and closes all queries: 
> KILL SESSION WHERE session_id='2644d52c79c4c1e4:e974538f2189ed82'  {code}
> this command should terminate the session and kill all active queries and 
> close all queries that are finished  and are waiting to be closed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to