Joe McDonnell created IMPALA-8666:
-------------------------------------
Summary: HdfsParquetScanner::ProcessFooter() should do validation
when it reads a bigger footer
Key: IMPALA-8666
URL: https://issues.apache.org/jira/browse/IMPALA-8666
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Joe McDonnell
In IMPALA-8561, a user encountered an error deserializing the footer when
HdfsParquetScanner::ProcessFooter() issues an IO for a Parquet footer that
exceeds the default 100KB size. IMPALA-8561 fixed an underlying issue that
would result in stale data being returned by DiskIoMgr in this case, but
HdfsParquetScanner::ProcessFooter() needs to add validation to the codepath
reading the larger footer. Specifically, it does not check the magic value that
should be at the end of the file
([https://github.com/apache/impala/blob/11a2e86c28c7c7dcf9f394a82fc4045760fff97b/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1392-L1399]):
{code:java}
// Validate magic file bytes are correct.
uint8_t* magic_number_ptr = buffer + scan_range_len -
sizeof(PARQUET_VERSION_NUMBER);
if (memcmp(magic_number_ptr, PARQUET_VERSION_NUMBER,
sizeof(PARQUET_VERSION_NUMBER)) != 0) {
return Status(TErrorCode::PARQUET_BAD_VERSION_NUMBER, filename(),
string(reinterpret_cast<char*>(magic_number_ptr),
sizeof(PARQUET_VERSION_NUMBER)),
scan_node_->hdfs_table()->fully_qualified_name());
}
{code}
It should do this check on the new larger footer. It should also verify that
the size of the new larger footer is the same as what it saw earlier in the
initial 100KB IO.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]