[
https://issues.apache.org/jira/browse/IMPALA-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660878#comment-16660878
]
Lars Volker commented on IMPALA-7738:
-------------------------------------
We should check if there are asynchronous APIs available for some of our
dependencies. Wrapping the calls in a thread pool effectively makes them async.
For {{ipc.client.rpc-timeout}} I recently was painfully reminded that it
controls the internal RPC mechanism of HDFS, *not* the API call. When a
Namenode gets stuck, the HDFS-internal call will time-out and HDFS will try to
fail-over to the secondary NN. If that one is stuck too, it will cycle between
the primary and secondary 15 times (by default). Thus from an API view you will
see an effective timeout of {{15 * ipc.client.rpc-timeout}}.
Unless we are very certain that API calls with timeouts work as expected, I'm
in favor of wrapping them in a thread pool.
> Implement timeouts for HDFS calls
> ---------------------------------
>
> Key: IMPALA-7738
> URL: https://issues.apache.org/jira/browse/IMPALA-7738
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0,
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
> Reporter: Michael Ho
> Priority: Critical
>
> Currently, there is no timeout with the various HDFS calls (e.g. hdfsOpen(),
> hdfsRead()) we made in libhdfs.so in either the disk-io-mgr thread or scanner
> thread context. Various users of Impala have complaint in the past about hung
> queries which eventually boiled down to stuck hdfs calls. HDFS maintainers
> have been slow to find the root cause of those hangs. To make this kind of
> stuck queries problem easier to identify in the future, we should just
> enforce a timeout in various hdfs calls so the queries will fail when certain
> HDFS calls take longer than a designated timeout period.
> There may be multiple layers which this timeout can be enforced:
> * at Impala level, we can have a fixed sized thread pool which handles all
> hdfs calls. The existing hdfs calls will be a wrapper with a timeout.
> * at libhdfs.so, enforce a timeout at places in the HDFS client code which
> may block forever.
> The second option is probably beyond the charter of Apache Impala project.
> cc'ing [[email protected]], [~joemcdonnell]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]