Zoltán Borók-Nagy created IMPALA-14560:
------------------------------------------

             Summary: Detecting remoteness of page index is flawed
                 Key: IMPALA-14560
                 URL: https://issues.apache.org/jira/browse/IMPALA-14560
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Zoltán Borók-Nagy


We use the "scanner_->metadata_range_->expected_local()" method to detect if 
reading the page index is expected to be remote read or not. Though 
"scanner_->metadata_range_" is basically the original split, the one that we 
are currently processing by the scanner, not the split of the metadata (footer).

A better approach would be to use a mechanism similar to what we have in 
"HdfsScanner::IssueFooterRanges()", which works this way:
 * Use HdfsScanner::FindFooterSplit() to retrieve the footer split
 * If footer_split is NULL, then exptected_local = false
 * Otherwise use footer_split->expected_local()

But instead of searching for the last split, we know the offsets and sizes of 
the column and offset indexes, so we can search for the split(s) that contain 
them.

Setting expected_local correctly is crucial for performance, as we only use the 
remote data cache if the read is expected to be remote.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to