[ 
https://issues.apache.org/jira/browse/IMPALA-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab updated IMPALA-6423:
---------------------------------
    Fix Version/s: Impala 2.12.0

> HDFS scanners do not get cancelled in a timely manner
> -----------------------------------------------------
>
>                 Key: IMPALA-6423
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6423
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Quanlong Huang
>            Assignee: Gabor Kaszab
>            Priority: Critical
>             Fix For: Impala 2.12.0
>
>         Attachments: hdfs_scan_node_cancellation.log
>
>
> The non-MT scanner (MT_DOP=0) does not check RuntimeState::is_cancelled() for 
> cancellation. Instead, it checks ScannerContext::cancelled() which only 
> returns HdfsScanNode:: Done(). Thus, the scanner can't notice the 
> cancellation in time. The FragmentInstance of it will still try to send out 
> data until it reaches the 2m time-out.
>  This can be reproduced by
> {code:java}
> bin/impala-shell.sh -q "set debug_action=1:PREPARE:FAIL; select * from 
> functional_parquet.alltypes"
> {code}
> Then tail -f the log. You can find logs like
> {code:java}
> I0119 05:39:25.453212 14051 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:39:25.456718 14054 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:39:30.408521 13825 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:39:30.408646 14022 query-state.cc:401] Cancel: 
> query_id=714ff9fe107a7c25:a2344b1800000000
> I0119 05:39:35.409497 13825 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:39:35.409601 14022 query-state.cc:401] Cancel: 
> query_id=714ff9fe107a7c25:a2344b1800000000
> ......
> I0119 05:41:25.429842 13825 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:41:25.429960 14022 query-state.cc:401] Cancel: 
> query_id=714ff9fe107a7c25:a2344b1800000000
> {code}
> and finally the time-out logs
> {code:java}
> I0119 05:41:25.445632 13811 data-stream-mgr.cc:130] Datastream sender 
> timed-out waiting for recvr for fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000 (time-out was: 2m). Increase 
> --datastream_sender_timeout_ms if you see this message frequently.
> I0119 05:41:25.445688 13811 data-stream-mgr.cc:191] DataStreamMgr::AddData(): 
> Sender timed out waiting for receiver fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000, dest node: 1
> E0119 05:41:25.445838 14015 data-stream-sender.cc:278] channel send to 
> iadhadoop-c39:22000 failed: Sender timed out waiting for receiver fragment 
> instance: 714ff9fe107a7c25:a2344b1800000000, dest node: 1
> I0119 05:41:25.446475 13825 impala-server.cc:1165] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 714ff9fe107a7c25:a2344b1800000000
> I0119 05:41:25.446524 14015 query-state.cc:401] Cancel: 
> query_id=714ff9fe107a7c25:a2344b1800000000
> I0119 05:41:25.447132 14015 query-state.cc:388] Instance completed. 
> instance_id=714ff9fe107a7c25:a2344b1800000002 #in-flight=0 
> status=DATASTREAM_SENDER_TIMEOUT: Sender timed out waiting for receiver 
> fragment instance: 714ff9fe107a7c25:a2344b1800000000, dest node: 1 
> I0119 05:41:25.447149 14015 query-state.cc:401] Cancel: 
> query_id=714ff9fe107a7c25:a2344b1800000000
> I0119 05:41:25.447201 14015 query-exec-mgr.cc:149] ReleaseQueryState(): 
> query_id=714ff9fe107a7c25:a2344b1800000000 refcnt=1
> I0119 05:41:25.448110 13843 data-stream-mgr.cc:130] Datastream sender 
> timed-out waiting for recvr for fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000 (time-out was: 2m). Increase 
> --datastream_sender_timeout_ms if you see this message frequently.
> I0119 05:41:25.448130 13843 data-stream-mgr.cc:191] DataStreamMgr::AddData(): 
> Sender timed out waiting for receiver fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000, dest node: 1
> I0119 05:41:25.454107 13852 data-stream-mgr.cc:130] Datastream sender 
> timed-out waiting for recvr for fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000 (time-out was: 2m). Increase 
> --datastream_sender_timeout_ms if you see this message frequently.
> I0119 05:41:25.454149 13852 data-stream-mgr.cc:191] DataStreamMgr::AddData(): 
> Sender timed out waiting for receiver fragment instance: 
> 714ff9fe107a7c25:a2344b1800000000, dest node: 1
> {code}
> This may not always happen. Just have a few try and you will reproduce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to