Zoltán Borók-Nagy created IMPALA-14560:
------------------------------------------
Summary: Detecting remoteness of page index is flawed
Key: IMPALA-14560
URL: https://issues.apache.org/jira/browse/IMPALA-14560
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Zoltán Borók-Nagy
We use the "scanner_->metadata_range_->expected_local()" method to detect if
reading the page index is expected to be remote read or not. Though
"scanner_->metadata_range_" is basically the original split, the one that we
are currently processing by the scanner, not the split of the metadata (footer).
A better approach would be to use a mechanism similar to what we have in
"HdfsScanner::IssueFooterRanges()", which works this way:
* Use HdfsScanner::FindFooterSplit() to retrieve the footer split
* If footer_split is NULL, then exptected_local = false
* Otherwise use footer_split->expected_local()
But instead of searching for the last split, we know the offsets and sizes of
the column and offset indexes, so we can search for the split(s) that contain
them.
Setting expected_local correctly is crucial for performance, as we only use the
remote data cache if the read is expected to be remote.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)