Joe McDonnell created IMPALA-8561:
-------------------------------------
Summary: ScanRanges with mtime=-1 can lead to inconsistent reads
when using the file handle cache
Key: IMPALA-8561
URL: https://issues.apache.org/jira/browse/IMPALA-8561
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell
The file handle cache relies on the mtime to distinguish between different
versions of a file. For example, if file X exists with mtime=1, then it is
overwritten and the metadata is updated so that now it is at mtime=2, the file
handle cache treats them as completely different things and can never use a
single file handle to serve both. However, some codepaths generate ScanRanges
with an mtime of -1. This removes the ability to distinguish these two versions
of a file and can read to consistency problems.
A specific example is the code that reads the parquet footer
[HdfsParquetScanner::ProcessFooter()|[https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1354]].
We don't know ahead of time how big the Parquet footer is. So, we read 100KB
(determined by
[FOOTER_SIZE|[https://github.com/apache/impala/blob/449fe73d2145bd22f0f857623c3652a097f06d73/be/src/exec/hdfs-scanner.h#L331]]).
If the footer size encoded in the last few bytes of the file indicates that
the footer is larger than that [code
here|[https://github.com/apache/impala/blob/832c9de7810b47b5f782bccb761e07264e7548e5/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1414]],
then we issue a separate read for the actual size of the footer. That separate
read does not inherit the mtime of the original read and instead uses an mtime
of -1. I verified this by adding tracing and issuing a select against
functional_parquet.widetable_1000_cols.
A failure scenario associated with this is that we read the last 100KB using a
ScanRange with mtime=2, then we find that the footer is larger than 100KB and
issue a ScanRange with mtime=-1. This uses a file handle that is from a
previous version of the file equivalent to mtime=1. The data it is reading may
not come from the end of the file, or it may be at the end of the file but the
footer has a different length. (There is no validation on the new read to check
the magic value or metadata size reported by the new buffer.) Either would
result in a failure to deserialize the thrift for the footer. For example, a
problem case could produce an error message like:
{noformat}
File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has
invalid file metadata at file offset 462017. Error = couldn't deserialize
thrift msg:
TProtocolException: Invalid data
.{noformat}
To fix this, we should examine all locations that can result in ScanRanges with
mtime=-1 and eliminate any that we can. For example, the
HdfsParquetScanner::ProcessFooter() code should create a ScanRange that
inherits the mtime from the original footer ScanRange. Also, the file handle
cache should refuse to cache file handles with mtime=-1.
The code in HdfsParquetScanner::ProcessFooter() should add validation for the
magic value and metadata size when reading a footer larger than 100KB to verify
that we are reading something valid. The thrift deserialize failure gives some
information, but catching this case more specifically would provide a better
error message.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]