[
https://issues.apache.org/jira/browse/IMPALA-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712325#comment-16712325
]
ASF subversion and git services commented on IMPALA-7738:
---------------------------------------------------------
Commit 938be0e840c84263a2b47fb89e655d998363b819 in impala's branch
refs/heads/master from [~joemcdonnell]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=938be0e ]
IMPALA-7738: Implement timeouts for HDFS open calls
This is part 1 of a push to add timeouts for all HDFS operations.
It adds timeouts for opening an HDFS file handle.
It introduces a new SynchronousThreadPool, which executes
an operation in a thread pool and waits up to a specified
timeout for the operation to complete. This type of thread
pool can accept any subclass of SynchronousWorkItem, and
a single thread pool can process different types of work
items. It is tested by a new test case in thread-pool-test.
This also introduces a new HdfsMonitor which implements
timeouts for HDFS operations, currently limited to
hdfsOpenFile(). This is implemented using a SynchronousThreadPool.
The timeout for hdfs operations is specified by
hdfs_operation_timeout_sec, which defaults to 5 minutes.
Testing:
1. Added a test to thread-pool-test for the new
SynchronousThreadPool.
2. Core tests
3. Added a custom cluster test that does "kill -STOP"
for the NameNode and verifies that a subsequent
hdfsOpenFile operation times out.
Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454
Reviewed-on: http://gerrit.cloudera.org:8080/11874
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Implement timeouts for HDFS calls
> ---------------------------------
>
> Key: IMPALA-7738
> URL: https://issues.apache.org/jira/browse/IMPALA-7738
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0,
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
> Reporter: Michael Ho
> Assignee: Joe McDonnell
> Priority: Critical
>
> Currently, there is no timeout with the various HDFS calls (e.g. hdfsOpen(),
> hdfsRead()) we made in libhdfs.so in either the disk-io-mgr thread or scanner
> thread context. Various users of Impala have complaint in the past about hung
> queries which eventually boiled down to stuck hdfs calls. HDFS maintainers
> have been slow to find the root cause of those hangs. To make this kind of
> stuck queries problem easier to identify in the future, we should just
> enforce a timeout in various hdfs calls so the queries will fail when certain
> HDFS calls take longer than a designated timeout period.
> There may be multiple layers which this timeout can be enforced:
> * at Impala level, we can have a fixed sized thread pool which handles all
> hdfs calls. The existing hdfs calls will be a wrapper with a timeout.
> * at libhdfs.so, enforce a timeout at places in the HDFS client code which
> may block forever.
> The second option is probably beyond the charter of Apache Impala project.
> cc'ing [[email protected]], [~joemcdonnell]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]