[ 
https://issues.apache.org/jira/browse/IMPALA-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659649#comment-16659649
 ] 

Michael Ho edited comment on IMPALA-7738 at 10/23/18 12:31 AM:
---------------------------------------------------------------

Thanks [~philip]. I think the watchdog approach complements this JIRA (or vice 
versa). We can issue a watchdog warning before the eventual timeout. Also, we 
have noticed in the past that the HDFS client can be randomly stuck at places 
other than RPC to name node but I agree that {{ipc.client.rpc-timeout}} is 
another must-have for Impala deployment on HDFS.


was (Author: kwho):
Thanks [~philip]. I think the watchdog approach complements each other. We can 
issue a watchdog warning before the eventual timeout. Also, we have noticed in 
the past that the HDFS client can be randomly stuck at places other than RPC to 
name node.

> Implement timeouts for HDFS calls
> ---------------------------------
>
>                 Key: IMPALA-7738
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7738
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
>            Reporter: Michael Ho
>            Priority: Critical
>
> Currently, there is no timeout with the various HDFS calls (e.g. hdfsOpen(), 
> hdfsRead()) we made in libhdfs.so in either the disk-io-mgr thread or scanner 
> thread context. Various users of Impala have complaint in the past about hung 
> queries which eventually boiled down to stuck hdfs calls. HDFS maintainers 
> have been slow to find the root cause of those hangs. To make this kind of 
> stuck queries problem easier to identify in the future, we should just 
> enforce a timeout in various hdfs calls so the queries will fail when certain 
> HDFS calls take longer than a designated timeout period.
> There may be multiple layers which this timeout can be enforced:
>  * at Impala level, we can have a fixed sized thread pool which handles all 
> hdfs calls. The existing hdfs calls will be a wrapper with a timeout.
>  * at libhdfs.so, enforce a timeout at places in the HDFS client code which 
> may block forever.
> The second option is probably beyond the charter of Apache Impala project.
> cc'ing [[email protected]], [~joemcdonnell]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to