[
https://issues.apache.org/jira/browse/IMPALA-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-11704.
------------------------------------
Resolution: Fixed
> Remote Ozone scans are slow even after data cache warmup
> --------------------------------------------------------
>
> Key: IMPALA-11704
> URL: https://issues.apache.org/jira/browse/IMPALA-11704
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.1
> Reporter: Michael Smith
> Assignee: Michael Smith
> Priority: Major
> Fix For: Impala 4.2.0
>
>
> From [~drorke]:
> {quote}
> Running some basic performance sanity tests ... with Impala TPC-DS queries
> against Ozone vs HDFS. Impala appears to be using it's data cache for both
> Ozone and HDFS remote reads, but in the case of Ozone reads I'm still seeing
> long scan times and high I/O wait times even after cache warmup. Excerpts
> below from profiles of q90. Note in both cases the Impala profiles show 100%
> cache hit rates but for some reason the scan IO wait times are still much
> longer for the Ozone scans.
> {noformat}
> HDFS:
> - TotalTime: 1s924ms
> - ScannerIoWaitTime: 52.037ms
> Ozone:
> - TotalTime: 8s917ms
> - ScannerIoWaitTime: 7s454ms{noformat}
> If I disable the local cache explicitly via query option I get the following
> times for the same scan:
> {noformat}
> HDFS:
> - TotalTime: 7s792ms
> - ScannerIoWaitTime: 6s244ms
> Ozone:
> - TotalTime: 8s963ms
> - ScannerIoWaitTime: 7s464ms{noformat}
> {quote}
> Investigating a bit, [~joemcdonnell] noticed in the Ozone profile
> {noformat}
> - ScannerIoWaitTime: 7s454ms
> - TotalRawHdfsOpenFileTime: 5s782ms
> {noformat}
> Based on profile differences around {{TotalRawHdfsOpenFileTime=5s782ms}} (vs
> {{0ms}} for HDFS), I believe this is a difference in performance when using
> the data cache but the file handle cache is disabled. That traces back to an
> incomplete implementation of
> [IMPALA-10147|https://issues.apache.org/jira/browse/IMPALA-10147].
> A data read:
> 1. [Checks that it can open a file
> handle|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/scan-range.cc#L199].
> When file handle cache is enabled, this is a
> [noop|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L67].
> 2. It will then try to read data. If data cache is enabled, it will [try to
> read from the data
> cache|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L137].
> 3. If data cache hits, that data is returned and any open file handles are
> unused.
> When the file handle cache is disabled, opening the file handle [calls
> hdfsOpenFile and
> hdfsSeek|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L70-L72].
> {{hdfsOpenFile}} in particular is monitored and added to the profile as
> {{TotalRawHdfsOpenFileTime}}. That time in the Ozone profile accounts for
> most of the difference in performance between HDFS and Ozone in this case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)