[
https://issues.apache.org/jira/browse/IMPALA-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631805#comment-17631805
]
Michael Smith commented on IMPALA-11704:
----------------------------------------
Found that exhaustive testing wasn't sufficient. Running the whole test suite
with data cache enabled found an error
{code}
F1109 03:43:19.240250 7283 hdfs-file-reader.cc:318]
b14bc10d21ff3351:7e55cc7800000000] Check failed: exclusive_hdfs_fh_ != nullptr
*** Check failure stack trace: ***
@ 0x3753e0d google::LogMessage::Fail()
@ 0x3755d44 google::LogMessage::SendToLog()
@ 0x37537ec google::LogMessage::Flush()
@ 0x3756269 google::LogMessageFatal::~LogMessageFatal()
@ 0x1e763d2 impala::io::HdfsFileReader::CachedFile()
@ 0x1e6c904 impala::io::ScanRange::ReadFromCache()
@ 0x1e601fb impala::io::RequestContext::TryReadFromCache()
@ 0x1e625e4 impala::io::RequestContext::GetNextUnstartedRange()
@ 0x1a4f376 impala::HdfsScanNode::GetNextScanRangeToRead()
@ 0x1962320 impala::HdfsScanNodeBase::StartNextScanRange()
@ 0x1a5393a impala::HdfsScanNode::ScannerThread()
@ 0x1a543ce
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@ 0x18b7182 impala::Thread::SuperviseThread()
@ 0x18b7f8b boost::detail::thread_data<>::run()
@ 0x2380f77 thread_proxy
@ 0x7f7b49971ea5 start_thread
@ 0x7f7b468a6b0d __clone
{code}
> Remote Ozone scans are slow even after data cache warmup
> --------------------------------------------------------
>
> Key: IMPALA-11704
> URL: https://issues.apache.org/jira/browse/IMPALA-11704
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.1
> Reporter: Michael Smith
> Assignee: Michael Smith
> Priority: Major
> Fix For: Impala 4.2.0
>
>
> From [~drorke]:
> {quote}
> Running some basic performance sanity tests ... with Impala TPC-DS queries
> against Ozone vs HDFS. Impala appears to be using it's data cache for both
> Ozone and HDFS remote reads, but in the case of Ozone reads I'm still seeing
> long scan times and high I/O wait times even after cache warmup. Excerpts
> below from profiles of q90. Note in both cases the Impala profiles show 100%
> cache hit rates but for some reason the scan IO wait times are still much
> longer for the Ozone scans.
> {noformat}
> HDFS:
> - TotalTime: 1s924ms
> - ScannerIoWaitTime: 52.037ms
> Ozone:
> - TotalTime: 8s917ms
> - ScannerIoWaitTime: 7s454ms{noformat}
> If I disable the local cache explicitly via query option I get the following
> times for the same scan:
> {noformat}
> HDFS:
> - TotalTime: 7s792ms
> - ScannerIoWaitTime: 6s244ms
> Ozone:
> - TotalTime: 8s963ms
> - ScannerIoWaitTime: 7s464ms{noformat}
> {quote}
> Investigating a bit, [~joemcdonnell] noticed in the Ozone profile
> {noformat}
> - ScannerIoWaitTime: 7s454ms
> - TotalRawHdfsOpenFileTime: 5s782ms
> {noformat}
> Based on profile differences around {{TotalRawHdfsOpenFileTime=5s782ms}} (vs
> {{0ms}} for HDFS), I believe this is a difference in performance when using
> the data cache but the file handle cache is disabled. That traces back to an
> incomplete implementation of
> [IMPALA-10147|https://issues.apache.org/jira/browse/IMPALA-10147].
> A data read:
> 1. [Checks that it can open a file
> handle|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/scan-range.cc#L199].
> When file handle cache is enabled, this is a
> [noop|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L67].
> 2. It will then try to read data. If data cache is enabled, it will [try to
> read from the data
> cache|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L137].
> 3. If data cache hits, that data is returned and any open file handles are
> unused.
> When the file handle cache is disabled, opening the file handle [calls
> hdfsOpenFile and
> hdfsSeek|https://github.infra.cloudera.com/CDH/Impala/blob/CDWH-2022.0.10.1/be/src/runtime/io/hdfs-file-reader.cc#L70-L72].
> {{hdfsOpenFile}} in particular is monitored and added to the profile as
> {{TotalRawHdfsOpenFileTime}}. That time in the Ozone profile accounts for
> most of the difference in performance between HDFS and Ozone in this case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]