IMPALA-6549: Enable file handle cache by default

The file handle cache was disabled by default
due to two HDFS issues: HDFS-12528 and HDFS-14872.
Both have been fixed and the CDH components in
the toolchain include both fixes.

This reenables the file handle cache by default.

Change-Id: I6935825a1c4c7b2da0bb877f732027be1a57a8b7
Reviewed-by: Joe McDonnell <>
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong <>
Tested-by: Tim Armstrong <>


Branch: refs/heads/2.x
Commit: 876f289fe005a5bb9084d6d3176dfaa11cfa7271
Parents: 74e7245
Author: Joe McDonnell <>
Authored: Tue Feb 20 16:37:29 2018 -0800
Committer: Tim Armstrong <>
Committed: Sat Feb 24 01:58:46 2018 +0000

 be/src/runtime/io/ | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/be/src/runtime/io/ b/be/src/runtime/io/
index 6c7b9e6..0ac3669 100644
--- a/be/src/runtime/io/
+++ b/be/src/runtime/io/
@@ -98,10 +98,7 @@ DEFINE_int32(max_free_io_buffers, 128,
 // uses about 6kB of memory. 20k file handles will thus reserve ~120MB of 
 // The actual amount of memory that is associated with a file handle can be 
 // or smaller, depending on the replication factor for this file or the path 
-// TODO: This is currently disabled due to HDFS-12528, which can disable short 
-// reads when file handle caching is enabled. This should be reenabled by 
-// when that issue is fixed.
-DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file handles 
+DEFINE_uint64(max_cached_file_handles, 20000, "Maximum number of HDFS file 
handles "
     "that will be cached. Disabled if set to 0.");
 // The unused file handle timeout specifies how long a file handle will remain 
in the
@@ -112,11 +109,12 @@ DEFINE_uint64(max_cached_file_handles, 0, "Maximum number 
of HDFS file handles "
 // If a file is deleted through HDFS, this open file descriptor can keep the 
disk space
 // from being freed. When the metadata sees that a file has been deleted, the 
file handle
 // will no longer be used by future queries. Aging out this file handle allows 
-// disk space to be freed in an appropriate period of time.
-// TODO: HDFS-12528 (which can disable short circuit reads) is more likely to 
-// if file handles are cached for longer than 5 minutes. Use a conservative 
value for
-// the unused file handle cache timeout until HDFS-12528 is fixed.
-DEFINE_uint64(unused_file_handle_timeout_sec, 270, "Maximum time, in seconds, 
that an "
+// disk space to be freed in an appropriate period of time. The default value 
+// 6 hours. This was chosen to be less than a typical value for HDFS's 
+// This means that when files are deleted via the trash, the file handle cache 
+// have evicted the file handle before the files are flushed from the trash. 
+// means that the file handle cache won't impact available disk space.
+DEFINE_uint64(unused_file_handle_timeout_sec, 21600, "Maximum time, in 
seconds, that an "
     "unused HDFS file handle will remain in the file handle cache. Disabled if 
set "
     "to 0.");

Reply via email to