Joe McDonnell created IMPALA-8666:
-------------------------------------

             Summary: HdfsParquetScanner::ProcessFooter() should do validation 
when it reads a bigger footer
                 Key: IMPALA-8666
                 URL: https://issues.apache.org/jira/browse/IMPALA-8666
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 3.3.0
            Reporter: Joe McDonnell


In IMPALA-8561, a user encountered an error deserializing the footer when 
HdfsParquetScanner::ProcessFooter() issues an IO for a Parquet footer that 
exceeds the default 100KB size. IMPALA-8561 fixed an underlying issue that 
would result in stale data being returned by DiskIoMgr in this case, but 
HdfsParquetScanner::ProcessFooter() needs to add validation to the codepath 
reading the larger footer. Specifically, it does not check the magic value that 
should be at the end of the file 
([https://github.com/apache/impala/blob/11a2e86c28c7c7dcf9f394a82fc4045760fff97b/be/src/exec/parquet/hdfs-parquet-scanner.cc#L1392-L1399]):
{code:java}
// Validate magic file bytes are correct.
        uint8_t* magic_number_ptr = buffer + scan_range_len - 
sizeof(PARQUET_VERSION_NUMBER);
        if (memcmp(magic_number_ptr, PARQUET_VERSION_NUMBER,
        sizeof(PARQUET_VERSION_NUMBER)) != 0) {
        return Status(TErrorCode::PARQUET_BAD_VERSION_NUMBER, filename(),
        string(reinterpret_cast<char*>(magic_number_ptr), 
sizeof(PARQUET_VERSION_NUMBER)),
        scan_node_->hdfs_table()->fully_qualified_name());
        }
{code}
It should do this check on the new larger footer. It should also verify that 
the size of the new larger footer is the same as what it saw earlier in the 
initial 100KB IO.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to