IMPALA-6813: Hedged reads metrics broken when scanning non-HDFS based table

We realized that the libHDFS API call hdfsGetHedgedReadMetrics() crashes
when the 'fs' argument passed to it is not a HDFS filesystem.

There is an open bug for it on the HDFS side: HDFS-13417
However, it looks like we won't be getting a fix for it in the short term,
so our only option at this point is to skip it.

Testing: Made sure that enabling preads and scanning from S3 doesn't
cause a crash.
Also, added a custom cluster test to exercise the pread code path. We
are unable to verify hedged reads in a minicluster, but we can at least
exercise the code path to make sure that nothing breaks.

Change-Id: I48fe80dfd9a1ed68a8f2b7038e5f42b5a3df3baa
Reviewed-on: http://gerrit.cloudera.org:8080/9966
Reviewed-by: Sailesh Mukil <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a3efde84
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a3efde84
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a3efde84

Branch: refs/heads/2.x
Commit: a3efde84a5e0ef17357d24c3e69aa3f255eb4865
Parents: 466188b
Author: Sailesh Mukil <[email protected]>
Authored: Mon Apr 9 15:26:06 2018 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Fri May 25 23:17:16 2018 +0000

----------------------------------------------------------------------
 be/src/runtime/io/scan-range.cc           |  4 +++-
 tests/custom_cluster/test_hedged_reads.py | 30 ++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/a3efde84/be/src/runtime/io/scan-range.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/io/scan-range.cc b/be/src/runtime/io/scan-range.cc
index c868c3d..409e743 100644
--- a/be/src/runtime/io/scan-range.cc
+++ b/be/src/runtime/io/scan-range.cc
@@ -498,11 +498,13 @@ void ScanRange::Close() {
       closed_file = true;
     }
 
-    if (FLAGS_use_hdfs_pread) {
+    if (FLAGS_use_hdfs_pread && IsHdfsPath(file())) {
       // Update Hedged Read Metrics.
       // We call it only if the --use_hdfs_pread flag is set, to avoid having 
the
       // libhdfs client malloc and free a hdfsHedgedReadMetrics object 
unnecessarily
       // otherwise. 'hedged_metrics' is only set upon success.
+      // We also avoid calling hdfsGetHedgedReadMetrics() when the file is not 
on HDFS
+      // (see HDFS-13417).
       struct hdfsHedgedReadMetrics* hedged_metrics;
       int success = hdfsGetHedgedReadMetrics(fs_, &hedged_metrics);
       if (success == 0) {

http://git-wip-us.apache.org/repos/asf/impala/blob/a3efde84/tests/custom_cluster/test_hedged_reads.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_hedged_reads.py 
b/tests/custom_cluster/test_hedged_reads.py
new file mode 100644
index 0000000..b24fd92
--- /dev/null
+++ b/tests/custom_cluster/test_hedged_reads.py
@@ -0,0 +1,30 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
+from tests.common.skip import SkipIf
+
[email protected]_hdfs
+class TestHedgedReads(CustomClusterTestSuite):
+  """ Exercises the hedged reads code path.
+      NOTE: We unfortunately cannot force hedged reads on a minicluster, but 
we enable
+      this test to at least make sure that the code path doesn't break."""
+  @CustomClusterTestSuite.with_args("--use_hdfs_pread=true")
+  def test_hedged_reads(self, vector):
+    QUERY = "select * from tpch_parquet.lineitem limit 100"
+    self.client.execute(QUERY)

Reply via email to