[ https://issues.apache.org/jira/browse/IMPALA-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881493#comment-17881493 ]
Xuebin Su edited comment on IMPALA-12648 at 9/13/24 8:58 AM: ------------------------------------------------------------- Hi! I made a quick prototype of the kill function: {code:sql} [localhost:21051] default> select kill('9941e24eefb3410f:a41fa8f400000000'); +-------------------------------------------+ | kill('9941e24eefb3410f:a41fa8f400000000') | +-------------------------------------------+ | true | +-------------------------------------------+ Fetched 1 row(s) in 0.15s {code} It takes a query ID as parameter, and returns whether the cancellation succeeded. The session that was running the cancelled query then received the following error message: {code:sql} ERROR: Query 9941e24eefb3410f:a41fa8f400000000 failed: Cancelled {code} However, this only works for queries whose coordinator is the impalad on which the kill function gets called. Otherwise, the cancellation operation will fail with the following error message: {code:java} Invalid or unknown query handle: 2648fbd9cb2fe338:26a5b59700000000. {code} and the kill function will return false. To overcome this limitation, it seems that we can rely on the sys.impala_query_live table. If I understand correctly, when answering a query on the sys.impala_query_live table, each impalad only returns the infomation of queries that it coordinates, as indicated by the query plan: {code:sql} [localhost:21051] default> explain select query_id from sys.impala_query_live; +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | sys.impala_query_live | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN SYSTEM_TABLE [sys.impala_query_live] | | row-size=12B cardinality=1 | +------------------------------------------------------------------------------------+ {code} Therefore, we can kill a query if we can call the kill function on the cooresponding impalad, i.e., before the tuple of the query infomation goes to the EXCHANGE node. Given that expressions like kill('9941e24eefb3410f:a41fa8f400000000') are lazily evaluated, we can achieve this by adding a SORT node before the EXCHANGE node. This means adding an ORDER BY clause when querying the sys.impala_query_live table. For example, if we want to kill queries whose state are 'RUNNING', we can run the following query: {code:sql} [localhost:21051] default> select query_id, kill(query_id) as killed from sys.impala_query_live where query_state = 'RUNNING' order by killed; +-----------------------------------+--------+ | query_id | killed | +-----------------------------------+--------+ | 9f4c2f6859234cef:22ba32af00000000 | true | +-----------------------------------+--------+ Fetched 1 row(s) in 0.10s {code} We can replace the predicate in the where clause of the kill_queries CTE with any other predicate to kill whatever queries we want. In this way, the kill function gets called before the tuples of the query infomation leave the cooresponding impalads becuase the tuples need to be sorted by the result of the kill function first. As a result, if I understand correctly, this query can kill queries coordinated by any impalad. What do you think? Thanks! was (Author: JIRAUSER305834): Hi! I made a quick protype of the kill function: {code:sql} [localhost:21051] default> select kill('9941e24eefb3410f:a41fa8f400000000'); +-------------------------------------------+ | kill('9941e24eefb3410f:a41fa8f400000000') | +-------------------------------------------+ | true | +-------------------------------------------+ Fetched 1 row(s) in 0.15s {code} It takes a query ID as parameter, and returns whether the cancellation succeeded. The session that was running the cancelled query then received the following error message: {code:sql} ERROR: Query 9941e24eefb3410f:a41fa8f400000000 failed: Cancelled {code} However, this only works for queries whose coordinator is the impalad on which the kill function gets called. Otherwise, the cancellation operation will fail with the following error message: {code:java} Invalid or unknown query handle: 2648fbd9cb2fe338:26a5b59700000000. {code} and the kill function will return false. To overcome this limitation, it seems that we can rely on the sys.impala_query_live table. If I understand correctly, when answering a query on the sys.impala_query_live table, each impalad only returns the infomation of queries that it coordinates, as indicated by the query plan: {code:sql} [localhost:21051] default> explain select query_id from sys.impala_query_live; +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | sys.impala_query_live | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN SYSTEM_TABLE [sys.impala_query_live] | | row-size=12B cardinality=1 | +------------------------------------------------------------------------------------+ {code} Therefore, we can kill a query if we can call the kill function on the cooresponding impalad, i.e., before the tuple of the query infomation goes to the EXCHANGE node. Given that expressions like kill('9941e24eefb3410f:a41fa8f400000000') are lazily evaluated, we can achieve this by adding a SORT node before the EXCHANGE node. This means adding an ORDER BY clause when querying the sys.impala_query_live table. For example, if we want to kill queries whose state are 'RUNNING', we can run the following query: {code:sql} [localhost:21051] default> select query_id, kill(query_id) as killed from sys.impala_query_live where query_state = 'RUNNING' order by killed; +-----------------------------------+--------+ | query_id | killed | +-----------------------------------+--------+ | 9f4c2f6859234cef:22ba32af00000000 | true | +-----------------------------------+--------+ Fetched 1 row(s) in 0.10s {code} We can replace the predicate in the where clause of the kill_queries CTE with any other predicate to kill whatever queries we want. In this way, the kill function gets called before the tuples of the query infomation leave the cooresponding impalads becuase the tuples need to be sorted by the result of the kill function first. As a result, if I understand correctly, this query can kill queries coordinated by any impalad. What do you think? Thanks! > Support killing queries and sessions programatically > ---------------------------------------------------- > > Key: IMPALA-12648 > URL: https://issues.apache.org/jira/browse/IMPALA-12648 > Project: IMPALA > Issue Type: New Feature > Components: Frontend > Affects Versions: Impala 4.3.0 > Reporter: Manish Maheshwari > Assignee: Xuebin Su > Priority: Major > > Support killing queries and sessions programatically via kill commands to be > able to manage impala running workloads. > 1. Killing Queries that are currently running > {code:java} > -- Forcibly terminates query with the specified query_id: > KILL QUERY WHERE query_id='634bf9fcf55278eb:ac0ef05300000000' {code} > For queries that are the finished and waiting to be closed, this command > should close them > 2. Killing sessions that are open > {code:java} > -- Forcibly terminates session and closes all queries: > KILL SESSION WHERE session_id='2644d52c79c4c1e4:e974538f2189ed82' {code} > this command should terminate the session and kill all active queries and > close all queries that are finished and are waiting to be closed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org