[
https://issues.apache.org/jira/browse/IMPALA-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804083#comment-16804083
]
ASF subversion and git services commented on IMPALA-6326:
---------------------------------------------------------
Commit ca21c0cf0908048dbeec38b0a6874064958e0cce in impala's branch
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ca21c0c ]
IMPALA-6326: part 2: remove fetch thread in stress test
Ensure cursor is only accessed from a single thread. The means reworking
the code so that we check the time limit between fetch calls.
Use EXEC_TIME_LIMIT_S as an alternative to the previous multi-threaded
cancellation logic - it allows queries to be cancelled even when the
client is blocked or slow. This is implemented with the concept of
a CancelMechanism that determines *how* a query should be cancelled.
Query timeouts (where we want to cancel queries that run longer
than expected) are implemented using both cancel mechanisms, in
case the client is stuck in fetch or similar. Expected cancellations
are implemented with a random mechanism so that both code paths get
covered.
Testing:
Ran a cluster stress test.
Ran a couple of single-node stress tests with TPC-H and random queries.
Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
Reviewed-on: http://gerrit.cloudera.org:8080/12681
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> segfault during impyla HiveServer2Cursor.cancel_operation() over SSL
> --------------------------------------------------------------------
>
> Key: IMPALA-6326
> URL: https://issues.apache.org/jira/browse/IMPALA-6326
> Project: IMPALA
> Issue Type: Bug
> Components: Clients
> Affects Versions: Impala 2.10.0, Impala 2.11.0
> Reporter: Matthew Mulder
> Assignee: Tim Armstrong
> Priority: Major
> Attachments: test_fork_crash.py
>
>
> During a stress test on a secure cluster one of the clients crashed in
> HiveServer2Cursor.cancel_operation().
> The stress test debug log shows{code}2017-12-13 16:50:52,624 21607 Query
> Consumer DEBUG:concurrent_select[579]:Requesting memory reservation
> 2017-12-13 16:50:52,624 21607 Query Consumer
> DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB
> overcommitted
> 2017-12-13 16:50:52,625 21607 Query Consumer
> DEBUG:concurrent_select[581]:Received memory reservation
> 2017-12-13 16:50:52,658 21607 Query Consumer
> DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database
> 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA:
> USE tpcds_300_decimal_parquet
> 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA:
> SET ABORT_ON_ERROR=1
> 2017-12-13 16:50:53,060 21607 Query Consumer
> DEBUG:concurrent_select[877]:Setting mem limit to 102 MB
> 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA:
> SET MEM_LIMIT=102M
> 2017-12-13 16:50:53,370 21607 Query Consumer
> DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at
> vc0704.test with timeout secs 52:
> select
> dt.d_year,
> item.i_category_id,
> item.i_category,
> sum(ss_ext_sales_price)
> from
> date_dim dt,
> store_sales,
> item
> where
> dt.d_date_sk = store_sales.ss_sold_date_sk
> and store_sales.ss_item_sk = item.i_item_sk
> and item.i_manager_id = 1
> and dt.d_moy = 11
> and dt.d_year = 2000
> group by
> dt.d_year,
> item.i_category_id,
> item.i_category
> order by
> sum(ss_ext_sales_price) desc,
> dt.d_year,
> item.i_category_id,
> item.i_category
> limit 100;
> 2017-12-13 16:51:08,491 21607 Query Consumer
> DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad900000000
> 2017-12-13 16:51:15,337 21607 Query Consumer
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:22,316 21607 Query Consumer
> DEBUG:concurrent_select[900]:Waiting for query to execute
> 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad900000000
> DEBUG:concurrent_select[1009]:Fetching result for query with id
> b6425b84aa45f633:9ce7cad900000000
> 2017-12-13 16:51:44,625 21607 Query Consumer
> DEBUG:concurrent_select[940]:Attempting cancellation of query with id
> b6425b84aa45f633:9ce7cad900000000
> 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling
> active operation{code}The impalad log shows{code}I1213 16:50:54.287511 136399
> admission-controller.cc:510] Schedule for
> id=b6425b84aa45f633:9ce7cad900000000 in pool_name=root.systest
> cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200
> max_mem=-1.00 B
> I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats:
> agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB,
> local_host(local_mem_admitted=132.02 GB, num_admitted_running=21,
> num_queued=0, backend_mem_reserved=194.58 GB)
> I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query
> id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:50:54.296922 136399 coordinator.cc:99] Exec()
> query_id=b6425b84aa45f633:9ce7cad900000000 stmt=/* Mem: 102 MB. Coordinator:
> vc0704.test. */
> select
> dt.d_year,
> item.i_category_id,
> item.i_category,
> sum(ss_ext_sales_price)
> from
> date_dim dt,
> store_sales,
> item
> where
> dt.d_date_sk = store_sales.ss_sold_date_sk
> and store_sales.ss_item_sk = item.i_item_sk
> and item.i_manager_id = 1
> and dt.d_moy = 11
> and dt.d_year = 2000
> group by
> dt.d_year,
> item.i_category_id,
> item.i_category
> order by
> sum(ss_ext_sales_price) desc,
> dt.d_year,
> item.i_category_id,
> item.i_category
> limit 100;
> I1213 16:50:59.263310 136399 query-state.cc:151] Using query memory limit
> from query options: 102.00 MB
> I1213 16:50:59.267033 136399 mem-tracker.cc:189] Using query memory limit:
> 102.00 MB
> I1213 16:50:59.272271 136399 coordinator.cc:357] starting execution on 8
> backends for query b6425b84aa45f633:9ce7cad900000000
> I1213 16:51:07.525143 136399 coordinator.cc:370] started execution on 8
> backends for query b6425b84aa45f633:9ce7cad900000000
> I1213 16:51:08.358772 136399 impala-hs2-server.cc:490] ExecuteStatement():
> return_val=TExecuteStatementResp {
> 01: status (struct) = TStatus {
> 01: statusCode (i32) = 0,
> },
> 02: operationHandle (struct) = TOperationHandle {
> 01: operationId (struct) = THandleIdentifier {
> 01: guid (string) =
> "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c",
> 02: secret (string) =
> "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c",
> },
> 02: operationType (i32) = 0,
> 03: hasResultSet (bool) = true,
> },
> }
> I1213 16:51:27.310685 136399 impala-hs2-server.cc:725]
> GetResultSetMetadata(): query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:51:27.316963 136399 impala-hs2-server.cc:759]
> GetResultSetMetadata(): return_val=TGetResultSetMetadataResp {
> 01: status (struct) = TStatus {
> 01: statusCode (i32) = 0,
> },
> 02: schema (struct) = TTableSchema {
> 01: columns (list) = list<struct>[4] {
> [0] = TColumnDesc {
> 01: columnName (string) = "d_year",
> 02: typeDesc (struct) = TTypeDesc {
> 01: types (list) = list<struct>[1] {
> [0] = TTypeEntry {
> 01: primitiveEntry (struct) = TPrimitiveTypeEntry {
> 01: type (i32) = 3,
> },
> },
> },
> },
> 03: position (i32) = 0,
> },
> [1] = TColumnDesc {
> 01: columnName (string) = "i_category_id",
> 02: typeDesc (struct) = TTypeDesc {
> 01: types (list) = list<struct>[1] {
> [0] = TTypeEntry {
> 01: primitiveEntry (struct) = TPrimitiveTypeEntry {
> 01: type (i32) = 3,
> },
> },
> },
> },
> 03: position (i32) = 1,
> },
> [2] = TColumnDesc {
> 01: columnName (string) = "i_category",
> 02: typeDesc (struct) = TTypeDesc {
> 01: types (list) = list<struct>[1] {
> [0] = TTypeEntry {
> 01: primitiveEntry (struct) = TPrimitiveTypeEntry {
> 01: type (i32) = 7,
> },
> },
> },
> },
> 03: position (i32) = 2,
> },
> [3] = TColumnDesc {
> 01: columnName (string) = "sum(ss_ext_sales_price)",
> 02: typeDesc (struct) = TTypeDesc {
> 01: types (list) = list<struct>[1] {
> [0] = TTypeEntry {
> 01: primitiveEntry (struct) = TPrimitiveTypeEntry {
> 01: type (i32) = 15,
> 02: typeQualifiers (struct) = TTypeQualifiers {
> 01: qualifiers (map) = map<string,struct>[2] {
> "precision" -> TTypeQualifierValue {
> 01: i32Value (i32) = 38,
> },
> "scale" -> TTypeQualifierValue {
> 01: i32Value (i32) = 2,
> },
> },
> },
> },
> },
> },
> },
> 03: position (i32) = 3,
> },
> },
> },
> }
> I1213 16:51:27.426277 136399 coordinator.cc:789] Coordinator waiting for
> backends to finish, 2 remaining
> I1213 16:52:42.215745 136399 coordinator.cc:794] All backends finished
> successfully.
> I1213 16:52:42.223140 136399 coordinator.cc:1090] Release admssion control
> resources for query b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.230653 136399 impala-hs2-server.cc:677] CancelOperation():
> query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.230882 136399 impala-server.cc:1075] Cancel():
> query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.233353 136399 coordinator.cc:895] Cancel()
> query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.234686 136399 coordinator.cc:905] CancelBackends()
> query_id=b6425b84aa45f633:9ce7cad900000000, tried to cancel 0 backends
> I1213 16:52:42.276818 136399 impala-server.cc:1796] Connection from client
> 172.28.194.153:60218 closed, closing 2 associated session(s)
> I1213 16:52:42.277026 136399 impala-server.cc:992] UnregisterQuery():
> query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.279270 136399 impala-server.cc:1075] Cancel():
> query_id=b6425b84aa45f633:9ce7cad900000000
> I1213 16:52:42.924029 136399 query-exec-mgr.cc:149] ReleaseQueryState():
> query_id=b6425b84aa45f633:9ce7cad900000000 refcnt=1
> I1213 16:52:43.094966 136399 thrift-util.cc:123] SSL_shutdown: tlsv1 alert
> record overflow{code}The test host kernel log shows{code}Dec 14 00:52:42
> ip-172-28-194-153 kernel: [ 6295.010585] show_signal_msg: 36 callbacks
> suppressed
> Dec 14 00:52:42 ip-172-28-194-153 kernel: [ 6295.010592] python[22107]:
> segfault at 7f35001d4000 ip 00007f3557110794 sp 00007f350dfe4308 error 4 in
> libc-2.19.so[7f3557076000+1be000]{code}Although the stress test debug log
> shows that it started fetching results, none were saved to the output file
> for that query.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]