IMPALA-6813: Hedged reads metrics broken when scanning non-HDFS based table
We realized that the libHDFS API call hdfsGetHedgedReadMetrics() crashes when the 'fs' argument passed to it is not a HDFS filesystem. There is an open bug for it on the HDFS side: HDFS-13417 However, it looks like we won't be getting a fix for it in the short term, so our only option at this point is to skip it. Testing: Made sure that enabling preads and scanning from S3 doesn't cause a crash. Also, added a custom cluster test to exercise the pread code path. We are unable to verify hedged reads in a minicluster, but we can at least exercise the code path to make sure that nothing breaks. Change-Id: I48fe80dfd9a1ed68a8f2b7038e5f42b5a3df3baa Reviewed-on: http://gerrit.cloudera.org:8080/9966 Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a3efde84 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a3efde84 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a3efde84 Branch: refs/heads/2.x Commit: a3efde84a5e0ef17357d24c3e69aa3f255eb4865 Parents: 466188b Author: Sailesh Mukil <[email protected]> Authored: Mon Apr 9 15:26:06 2018 -0700 Committer: Impala Public Jenkins <[email protected]> Committed: Fri May 25 23:17:16 2018 +0000 ---------------------------------------------------------------------- be/src/runtime/io/scan-range.cc | 4 +++- tests/custom_cluster/test_hedged_reads.py | 30 ++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/a3efde84/be/src/runtime/io/scan-range.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/io/scan-range.cc b/be/src/runtime/io/scan-range.cc index c868c3d..409e743 100644 --- a/be/src/runtime/io/scan-range.cc +++ b/be/src/runtime/io/scan-range.cc @@ -498,11 +498,13 @@ void ScanRange::Close() { closed_file = true; } - if (FLAGS_use_hdfs_pread) { + if (FLAGS_use_hdfs_pread && IsHdfsPath(file())) { // Update Hedged Read Metrics. // We call it only if the --use_hdfs_pread flag is set, to avoid having the // libhdfs client malloc and free a hdfsHedgedReadMetrics object unnecessarily // otherwise. 'hedged_metrics' is only set upon success. + // We also avoid calling hdfsGetHedgedReadMetrics() when the file is not on HDFS + // (see HDFS-13417). struct hdfsHedgedReadMetrics* hedged_metrics; int success = hdfsGetHedgedReadMetrics(fs_, &hedged_metrics); if (success == 0) { http://git-wip-us.apache.org/repos/asf/impala/blob/a3efde84/tests/custom_cluster/test_hedged_reads.py ---------------------------------------------------------------------- diff --git a/tests/custom_cluster/test_hedged_reads.py b/tests/custom_cluster/test_hedged_reads.py new file mode 100644 index 0000000..b24fd92 --- /dev/null +++ b/tests/custom_cluster/test_hedged_reads.py @@ -0,0 +1,30 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import pytest +from tests.common.custom_cluster_test_suite import CustomClusterTestSuite +from tests.common.skip import SkipIf + [email protected]_hdfs +class TestHedgedReads(CustomClusterTestSuite): + """ Exercises the hedged reads code path. + NOTE: We unfortunately cannot force hedged reads on a minicluster, but we enable + this test to at least make sure that the code path doesn't break.""" + @CustomClusterTestSuite.with_args("--use_hdfs_pread=true") + def test_hedged_reads(self, vector): + QUERY = "select * from tpch_parquet.lineitem limit 100" + self.client.execute(QUERY)
