Hi Hongxiang,
You can also look into logs of Impala.WARNING and Impala.ERROR. There may be some useful messages if you encounter server-level issues, e.g. OOM in Impalad side, failure in processing impalad catalog update, etc. Besides, tracking metrics in http://$IMPALAD_ADDRESS:25000/metrics?json for every impalad is also helpful for understanding what happens. Regards, Quanlong At 2018-05-25 11:35:27,"Mostafa Mokhtar" <[email protected]> wrote: >High CPU utilization is a side effect of cancellations, that issue is fixed in >newer releases. >For now try to figure why the queries are failing or getting cancelled. > >Thanks >Mostafa > >> On May 24, 2018, at 8:01 PM, 蒋鸿翔 <[email protected]> wrote: >> >> I find lots of query Cancel in the impala.INFO file like below: >> >> I0525 10:58:53.943994 1768 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=744b5e9cc58577e0:4e0ce9cb000000b7, node=19 >> I0525 10:58:53.948987 3937 query-state.cc:384] Instance completed. >> instance_id=444b3fde46797eb0:7006c870000000b7 #in-flight=19 >> status=CANCELLED: Cancelled >> I0525 10:58:53.949005 3937 query-state.cc:396] Cancel: >> query_id=444b3fde46797eb0:7006c87000000000 >> I0525 10:58:53.949053 3937 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=444b3fde46797eb0:7006c87000000000 refcnt=1 >> I0525 10:58:53.954001 1598 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=8447531d1858162e:b9ff6435000000b7, node=19 >> I0525 10:58:53.954322 1768 query-state.cc:384] Instance completed. >> instance_id=744b5e9cc58577e0:4e0ce9cb000000b7 #in-flight=18 >> status=CANCELLED: Cancelled >> I0525 10:58:53.954339 1768 query-state.cc:396] Cancel: >> query_id=744b5e9cc58577e0:4e0ce9cb00000000 >> I0525 10:58:53.954383 1768 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=744b5e9cc58577e0:4e0ce9cb00000000 refcnt=1 >> I0525 10:58:53.957116 2722 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=84f0bd5f760af30:acea1867000000b7, node=19 >> I0525 10:58:53.959987 2116 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=514c95d054f65673:b518d85600000000 refcnt=1 >> I0525 10:58:53.962363 1598 query-state.cc:384] Instance completed. >> instance_id=8447531d1858162e:b9ff6435000000b7 #in-flight=17 >> status=CANCELLED: Cancelled >> I0525 10:58:53.962378 1598 query-state.cc:396] Cancel: >> query_id=8447531d1858162e:b9ff643500000000 >> I0525 10:58:53.962388 1598 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=8447531d1858162e:b9ff643500000000 refcnt=2 >> I0525 10:58:53.964879 2722 query-state.cc:384] Instance completed. >> instance_id=84f0bd5f760af30:acea1867000000b7 #in-flight=16 >> status=CANCELLED: Cancelled >> I0525 10:58:53.964896 2722 query-state.cc:396] Cancel: >> query_id=84f0bd5f760af30:acea186700000000 >> I0525 10:58:53.964907 2722 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=84f0bd5f760af30:acea186700000000 refcnt=3 >> I0525 10:58:53.965350 2385 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=a04a31d6a9e920e5:172857e6000000b7, node=19 >> I0525 10:58:53.968524 2385 query-state.cc:384] Instance completed. >> instance_id=a04a31d6a9e920e5:172857e6000000b7 #in-flight=15 >> status=CANCELLED: Cancelled >> I0525 10:58:53.968538 2385 query-state.cc:396] Cancel: >> query_id=a04a31d6a9e920e5:172857e600000000 >> I0525 10:58:53.968588 2385 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=a04a31d6a9e920e5:172857e600000000 refcnt=1 >> I0525 10:58:53.971776 2162 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=b84ba74eafcd7021:d719632e000000b7, node=19 >> I0525 10:58:53.972705 1609 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=ca486aed8824f073:727affc3000000b7, node=19 >> I0525 10:58:53.975980 2162 query-state.cc:384] Instance completed. >> instance_id=b84ba74eafcd7021:d719632e000000b7 #in-flight=14 >> status=CANCELLED: Cancelled >> I0525 10:58:53.975991 2162 query-state.cc:396] Cancel: >> query_id=b84ba74eafcd7021:d719632e00000000 >> I0525 10:58:53.977706 1609 query-state.cc:384] Instance completed. >> instance_id=ca486aed8824f073:727affc3000000b7 #in-flight=13 >> status=CANCELLED: Cancelled >> I0525 10:58:53.977766 1609 query-state.cc:396] Cancel: >> query_id=ca486aed8824f073:727affc300000000 >> I0525 10:58:53.984678 3574 query-state.cc:384] Instance completed. >> instance_id=e6453622477ebb04:aa5dbaee0000002a #in-flight=12 >> status=CANCELLED: Cancelled >> I0525 10:58:53.984694 3574 query-state.cc:396] Cancel: >> query_id=e6453622477ebb04:aa5dbaee00000000 >> I0525 10:58:53.984706 3574 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=e6453622477ebb04:aa5dbaee00000000 refcnt=2 >> I0525 10:58:53.991593 1599 query-state.cc:384] Instance completed. >> instance_id=8447531d1858162e:b9ff643500000012 #in-flight=11 >> status=CANCELLED: Cancelled >> I0525 10:58:53.992913 2376 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=a749127995190814:bc7c6198000000b7, node=19 >> I0525 10:58:53.994562 1599 query-state.cc:396] Cancel: >> query_id=8447531d1858162e:b9ff643500000000 >> I0525 10:58:53.998580 2376 query-state.cc:384] Instance completed. >> instance_id=a749127995190814:bc7c6198000000b7 #in-flight=10 >> status=CANCELLED: Cancelled >> I0525 10:58:53.998597 2376 query-state.cc:396] Cancel: >> query_id=a749127995190814:bc7c619800000000 >> I0525 10:58:53.998643 2376 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=a749127995190814:bc7c619800000000 refcnt=1 >> I0525 10:58:54.000226 1609 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=ca486aed8824f073:727affc300000000 refcnt=1 >> I0525 10:58:54.009429 4376 data-stream-mgr.cc:247] DeregisterRecvr(): >> fragment_instance_id=84bd395b07ce7e6:5d272387000000b7, node=19 >> I0525 10:58:54.012881 4376 query-state.cc:384] Instance completed. >> instance_id=84bd395b07ce7e6:5d272387000000b7 #in-flight=9 status=CANCELLED: >> Cancelled >> I0525 10:58:54.012898 4376 query-state.cc:396] Cancel: >> query_id=84bd395b07ce7e6:5d27238700000000 >> I0525 10:58:54.012948 4376 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=84bd395b07ce7e6:5d27238700000000 refcnt=1 >> I0525 10:58:54.042359 2162 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=b84ba74eafcd7021:d719632e00000000 refcnt=1 >> I0525 10:58:54.042918 1513 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=b3416591b07cc9d6:9dd2e8d300000000 refcnt=1 >> I0525 10:58:54.103459 1599 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=8447531d1858162e:b9ff643500000000 refcnt=1 >> I0525 10:58:54.396687 6291 query-state.cc:396] Cancel: >> query_id=6d4980aca4fcbcf3:a81f020700000000 >> I0525 10:58:54.404955 6291 query-state.cc:384] Instance completed. >> instance_id=6d4980aca4fcbcf3:a81f02070000002a #in-flight=8 >> status=CANCELLED: Cancelled >> I0525 10:58:54.404983 6291 query-state.cc:396] Cancel: >> query_id=6d4980aca4fcbcf3:a81f020700000000 >> I0525 10:58:54.405025 6291 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=6d4980aca4fcbcf3:a81f020700000000 refcnt=1 >> I0525 10:58:54.740310 1793 query-state.cc:384] Instance completed. >> instance_id=9e415937e959fbca:9c42a19400000012 #in-flight=7 >> status=CANCELLED: Cancelled >> I0525 10:58:54.740325 1793 query-state.cc:396] Cancel: >> query_id=9e415937e959fbca:9c42a19400000000 >> I0525 10:58:54.740335 1793 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=9e415937e959fbca:9c42a19400000000 refcnt=2 >> I0525 10:58:54.823081 1821 query-state.cc:384] Instance completed. >> instance_id=9e415937e959fbca:9c42a1940000009f #in-flight=6 >> status=CANCELLED: Cancelled >> I0525 10:58:54.823103 1821 query-state.cc:396] Cancel: >> query_id=9e415937e959fbca:9c42a19400000000 >> I0525 10:58:54.836597 4092 query-state.cc:384] Instance completed. >> instance_id=e6453622477ebb04:aa5dbaee0000009f #in-flight=5 >> status=CANCELLED: Cancelled >> I0525 10:58:54.836611 4092 query-state.cc:396] Cancel: >> query_id=e6453622477ebb04:aa5dbaee00000000 >> I0525 10:58:54.836650 4092 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=e6453622477ebb04:aa5dbaee00000000 refcnt=1 >> I0525 10:58:54.908187 1821 query-exec-mgr.cc:149] ReleaseQueryState(): >> query_id=9e415937e959fbca:9c42a19400000000 refcnt=1 >> I0525 10:58:55.032608 2164 query-state.cc:384] Instance completed. >> instance_id=3943c8cb3ad656cd:252cdc180000009f #in-flight=4 >> status=CANCELLED: Cancelled >> I0525 10:58:55.032662 2164 query-state.cc:396] Cancel: >> query_id=3943c8cb3ad656cd:252cdc1800000000 >> >> >> 2018-05-19 12:44 GMT+08:00 Mostafa Mokhtar <[email protected]>: >> >>> Can you please share the Impalad.INFO log file from the coordinator? >>> >>> Thanks >>> Mostafa >>> >>>> On May 18, 2018, at 8:36 PM, 蒋鸿翔 <[email protected]> wrote: >>>> >>>> Hi guys: >>>> >>>> Our team want to test the apache impala(ver: 2.10.0) by TPCH, when >>> running >>>> on 20 concurrent queries against 24 dell PowerEdge R730, We found the >>> test >>>> cause lots of system cpu for a very long time, and no query completed >>>> successfully. I don't known what happens, who can give me some advice? >>>> >>>> some information output by top command: >>>> >>>> %Cpu0 : 1.6 us, 84.8 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu1 : 0.4 us, 86.0 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu2 : 0.6 us, 86.0 sy, 0.0 ni, 13.4 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu3 : 0.0 us, 86.2 sy, 0.0 ni, 13.8 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu4 : 0.6 us, 87.1 sy, 0.0 ni, 12.3 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu5 : 1.0 us, 86.3 sy, 0.0 ni, 12.7 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> %Cpu6 : 0.4 us, 86.0 sy, 0.0 ni, 13.6 id, 0.0 wa, 0.0 hi, 0.0 si, >>>> 0.0 st >>>> ..... >>>> >>>> use "perf top" to see what happens: >>>> >>>> 84.24% [kernel] [k] KSTK_ESP >>>> >>>> 7.26% [kernel] [k] cpu_startup_entry >>>> >>>> 5.56% [kernel] [k] vm_is_stack >>>> >>>> 0.20% [kernel] [k] number.isra.2 >>>> >>>> 0.18% [kernel] [k] format_decode >>>> ........ >>>
