[
https://issues.apache.org/jira/browse/IMPALA-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856813#comment-16856813
]
ASF subversion and git services commented on IMPALA-8561:
---------------------------------------------------------
Commit e573b5502d4a93623dd1375e8e8febf9647c98db in impala's branch
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e573b55 ]
IMPALA-8562: Data cache should skip insertion of uncacheable ScanRanges
As shown in IMPALA-8561, there are some paths in the code which
create uncacheable ScanRanges. These uncacheable ScanRanges have
mtime of -1. 'mtime' is used for differentiating versions of files
with the same names. An mtime == -1 means the cache entry could
potentially be from any versions of a file with the same name.
This change skips lookup or insertion of ScanRange with negative
mtime, file offset or buffer length.
Testing done: Added targeted test cases in data-cache-test
Change-Id: I2294833b075a2ddcae956d9fdb04f3e85adb0391
Reviewed-on: http://gerrit.cloudera.org:8080/13369
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> ScanRanges with mtime=-1 can lead to inconsistent reads when using the file
> handle cache
> ----------------------------------------------------------------------------------------
>
> Key: IMPALA-8561
> URL: https://issues.apache.org/jira/browse/IMPALA-8561
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.3.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
>
> {color:red}colored text{color}The file handle cache relies on the mtime to
> distinguish between different versions of a file. For example, if file X
> exists with mtime=1, then it is overwritten and the metadata is updated so
> that now it is at mtime=2, the file handle cache treats them as completely
> different things and can never use a single file handle to serve both.
> However, some codepaths generate ScanRanges with an mtime of -1. This removes
> the ability to distinguish these two versions of a file and can read to
> consistency problems.
> A specific example is the code that reads the parquet footer
> [HdfsParquetScanner::ProcessFooter()|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354].
> We don't know ahead of time how big the Parquet footer is. So, we read 100KB
> (determined by
> [FOOTER_SIZE|https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]).
> If the footer size encoded in the last few bytes of the file indicates that
> the footer is larger than that [code
> here|https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414],
> then we issue a separate read for the actual size of the footer. That
> separate read does not inherit the mtime of the original read and instead
> uses an mtime of -1. I verified this by adding tracing and issuing a select
> against functional_parquet.widetable_1000_cols.
> A failure scenario associated with this is that we read the last 100KB using
> a ScanRange with mtime=2, then we find that the footer is larger than 100KB
> and issue a ScanRange with mtime=-1. This uses a file handle that is from a
> previous version of the file equivalent to mtime=1. The data it is reading
> may not come from the end of the file, or it may be at the end of the file
> but the footer has a different length. (There is no validation on the new
> read to check the magic value or metadata size reported by the new buffer.)
> Either would result in a failure to deserialize the thrift for the footer.
> For example, a problem case could produce an error message like:
>
> {noformat}
> File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has
> invalid file metadata at file offset 462017. Error = couldn't deserialize
> thrift msg:
> TProtocolException: Invalid data
> .{noformat}
> To fix this, we should examine all locations that can result in ScanRanges
> with mtime=-1 and eliminate any that we can. For example, the
> HdfsParquetScanner::ProcessFooter() code should create a ScanRange that
> inherits the mtime from the original footer ScanRange. Also, the file handle
> cache should refuse to cache file handles with mtime=-1.
> The code in HdfsParquetScanner::ProcessFooter() should add validation for the
> magic value and metadata size when reading a footer larger than 100KB to
> verify that we are reading something valid. The thrift deserialize failure
> gives some information, but catching this case more specifically would
> provide a better error message.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]