High CPU utilization is a side effect of cancellations, that issue is fixed in newer releases. For now try to figure why the queries are failing or getting cancelled.
Thanks Mostafa > On May 24, 2018, at 8:01 PM, 蒋鸿翔 <[email protected]> wrote: > > I find lots of query Cancel in the impala.INFO file like below: > > I0525 10:58:53.943994 1768 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=744b5e9cc58577e0:4e0ce9cb000000b7, node=19 > I0525 10:58:53.948987 3937 query-state.cc:384] Instance completed. > instance_id=444b3fde46797eb0:7006c870000000b7 #in-flight=19 > status=CANCELLED: Cancelled > I0525 10:58:53.949005 3937 query-state.cc:396] Cancel: > query_id=444b3fde46797eb0:7006c87000000000 > I0525 10:58:53.949053 3937 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=444b3fde46797eb0:7006c87000000000 refcnt=1 > I0525 10:58:53.954001 1598 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=8447531d1858162e:b9ff6435000000b7, node=19 > I0525 10:58:53.954322 1768 query-state.cc:384] Instance completed. > instance_id=744b5e9cc58577e0:4e0ce9cb000000b7 #in-flight=18 > status=CANCELLED: Cancelled > I0525 10:58:53.954339 1768 query-state.cc:396] Cancel: > query_id=744b5e9cc58577e0:4e0ce9cb00000000 > I0525 10:58:53.954383 1768 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=744b5e9cc58577e0:4e0ce9cb00000000 refcnt=1 > I0525 10:58:53.957116 2722 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=84f0bd5f760af30:acea1867000000b7, node=19 > I0525 10:58:53.959987 2116 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=514c95d054f65673:b518d85600000000 refcnt=1 > I0525 10:58:53.962363 1598 query-state.cc:384] Instance completed. > instance_id=8447531d1858162e:b9ff6435000000b7 #in-flight=17 > status=CANCELLED: Cancelled > I0525 10:58:53.962378 1598 query-state.cc:396] Cancel: > query_id=8447531d1858162e:b9ff643500000000 > I0525 10:58:53.962388 1598 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=8447531d1858162e:b9ff643500000000 refcnt=2 > I0525 10:58:53.964879 2722 query-state.cc:384] Instance completed. > instance_id=84f0bd5f760af30:acea1867000000b7 #in-flight=16 > status=CANCELLED: Cancelled > I0525 10:58:53.964896 2722 query-state.cc:396] Cancel: > query_id=84f0bd5f760af30:acea186700000000 > I0525 10:58:53.964907 2722 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=84f0bd5f760af30:acea186700000000 refcnt=3 > I0525 10:58:53.965350 2385 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=a04a31d6a9e920e5:172857e6000000b7, node=19 > I0525 10:58:53.968524 2385 query-state.cc:384] Instance completed. > instance_id=a04a31d6a9e920e5:172857e6000000b7 #in-flight=15 > status=CANCELLED: Cancelled > I0525 10:58:53.968538 2385 query-state.cc:396] Cancel: > query_id=a04a31d6a9e920e5:172857e600000000 > I0525 10:58:53.968588 2385 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=a04a31d6a9e920e5:172857e600000000 refcnt=1 > I0525 10:58:53.971776 2162 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=b84ba74eafcd7021:d719632e000000b7, node=19 > I0525 10:58:53.972705 1609 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=ca486aed8824f073:727affc3000000b7, node=19 > I0525 10:58:53.975980 2162 query-state.cc:384] Instance completed. > instance_id=b84ba74eafcd7021:d719632e000000b7 #in-flight=14 > status=CANCELLED: Cancelled > I0525 10:58:53.975991 2162 query-state.cc:396] Cancel: > query_id=b84ba74eafcd7021:d719632e00000000 > I0525 10:58:53.977706 1609 query-state.cc:384] Instance completed. > instance_id=ca486aed8824f073:727affc3000000b7 #in-flight=13 > status=CANCELLED: Cancelled > I0525 10:58:53.977766 1609 query-state.cc:396] Cancel: > query_id=ca486aed8824f073:727affc300000000 > I0525 10:58:53.984678 3574 query-state.cc:384] Instance completed. > instance_id=e6453622477ebb04:aa5dbaee0000002a #in-flight=12 > status=CANCELLED: Cancelled > I0525 10:58:53.984694 3574 query-state.cc:396] Cancel: > query_id=e6453622477ebb04:aa5dbaee00000000 > I0525 10:58:53.984706 3574 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=e6453622477ebb04:aa5dbaee00000000 refcnt=2 > I0525 10:58:53.991593 1599 query-state.cc:384] Instance completed. > instance_id=8447531d1858162e:b9ff643500000012 #in-flight=11 > status=CANCELLED: Cancelled > I0525 10:58:53.992913 2376 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=a749127995190814:bc7c6198000000b7, node=19 > I0525 10:58:53.994562 1599 query-state.cc:396] Cancel: > query_id=8447531d1858162e:b9ff643500000000 > I0525 10:58:53.998580 2376 query-state.cc:384] Instance completed. > instance_id=a749127995190814:bc7c6198000000b7 #in-flight=10 > status=CANCELLED: Cancelled > I0525 10:58:53.998597 2376 query-state.cc:396] Cancel: > query_id=a749127995190814:bc7c619800000000 > I0525 10:58:53.998643 2376 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=a749127995190814:bc7c619800000000 refcnt=1 > I0525 10:58:54.000226 1609 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=ca486aed8824f073:727affc300000000 refcnt=1 > I0525 10:58:54.009429 4376 data-stream-mgr.cc:247] DeregisterRecvr(): > fragment_instance_id=84bd395b07ce7e6:5d272387000000b7, node=19 > I0525 10:58:54.012881 4376 query-state.cc:384] Instance completed. > instance_id=84bd395b07ce7e6:5d272387000000b7 #in-flight=9 status=CANCELLED: > Cancelled > I0525 10:58:54.012898 4376 query-state.cc:396] Cancel: > query_id=84bd395b07ce7e6:5d27238700000000 > I0525 10:58:54.012948 4376 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=84bd395b07ce7e6:5d27238700000000 refcnt=1 > I0525 10:58:54.042359 2162 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=b84ba74eafcd7021:d719632e00000000 refcnt=1 > I0525 10:58:54.042918 1513 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=b3416591b07cc9d6:9dd2e8d300000000 refcnt=1 > I0525 10:58:54.103459 1599 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=8447531d1858162e:b9ff643500000000 refcnt=1 > I0525 10:58:54.396687 6291 query-state.cc:396] Cancel: > query_id=6d4980aca4fcbcf3:a81f020700000000 > I0525 10:58:54.404955 6291 query-state.cc:384] Instance completed. > instance_id=6d4980aca4fcbcf3:a81f02070000002a #in-flight=8 > status=CANCELLED: Cancelled > I0525 10:58:54.404983 6291 query-state.cc:396] Cancel: > query_id=6d4980aca4fcbcf3:a81f020700000000 > I0525 10:58:54.405025 6291 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=6d4980aca4fcbcf3:a81f020700000000 refcnt=1 > I0525 10:58:54.740310 1793 query-state.cc:384] Instance completed. > instance_id=9e415937e959fbca:9c42a19400000012 #in-flight=7 > status=CANCELLED: Cancelled > I0525 10:58:54.740325 1793 query-state.cc:396] Cancel: > query_id=9e415937e959fbca:9c42a19400000000 > I0525 10:58:54.740335 1793 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=9e415937e959fbca:9c42a19400000000 refcnt=2 > I0525 10:58:54.823081 1821 query-state.cc:384] Instance completed. > instance_id=9e415937e959fbca:9c42a1940000009f #in-flight=6 > status=CANCELLED: Cancelled > I0525 10:58:54.823103 1821 query-state.cc:396] Cancel: > query_id=9e415937e959fbca:9c42a19400000000 > I0525 10:58:54.836597 4092 query-state.cc:384] Instance completed. > instance_id=e6453622477ebb04:aa5dbaee0000009f #in-flight=5 > status=CANCELLED: Cancelled > I0525 10:58:54.836611 4092 query-state.cc:396] Cancel: > query_id=e6453622477ebb04:aa5dbaee00000000 > I0525 10:58:54.836650 4092 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=e6453622477ebb04:aa5dbaee00000000 refcnt=1 > I0525 10:58:54.908187 1821 query-exec-mgr.cc:149] ReleaseQueryState(): > query_id=9e415937e959fbca:9c42a19400000000 refcnt=1 > I0525 10:58:55.032608 2164 query-state.cc:384] Instance completed. > instance_id=3943c8cb3ad656cd:252cdc180000009f #in-flight=4 > status=CANCELLED: Cancelled > I0525 10:58:55.032662 2164 query-state.cc:396] Cancel: > query_id=3943c8cb3ad656cd:252cdc1800000000 > > > 2018-05-19 12:44 GMT+08:00 Mostafa Mokhtar <[email protected]>: > >> Can you please share the Impalad.INFO log file from the coordinator? >> >> Thanks >> Mostafa >> >>> On May 18, 2018, at 8:36 PM, 蒋鸿翔 <[email protected]> wrote: >>> >>> Hi guys: >>> >>> Our team want to test the apache impala(ver: 2.10.0) by TPCH, when >> running >>> on 20 concurrent queries against 24 dell PowerEdge R730, We found the >> test >>> cause lots of system cpu for a very long time, and no query completed >>> successfully. I don't known what happens, who can give me some advice? >>> >>> some information output by top command: >>> >>> %Cpu0 : 1.6 us, 84.8 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu1 : 0.4 us, 86.0 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu2 : 0.6 us, 86.0 sy, 0.0 ni, 13.4 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu3 : 0.0 us, 86.2 sy, 0.0 ni, 13.8 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu4 : 0.6 us, 87.1 sy, 0.0 ni, 12.3 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu5 : 1.0 us, 86.3 sy, 0.0 ni, 12.7 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> %Cpu6 : 0.4 us, 86.0 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>> 0.0 st >>> ..... >>> >>> use "perf top" to see what happens: >>> >>> 84.24% [kernel] [k] KSTK_ESP >>> >>> 7.26% [kernel] [k] cpu_startup_entry >>> >>> 5.56% [kernel] [k] vm_is_stack >>> >>> 0.20% [kernel] [k] number.isra.2 >>> >>> 0.18% [kernel] [k] format_decode >>> ........ >>
