Joe McDonnell created IMPALA-13894:
--------------------------------------
Summary: Tuple cache correctness verification should proceed past
file size differences
Key: IMPALA-13894
URL: https://issues.apache.org/jira/browse/IMPALA-13894
Project: IMPALA
Issue Type: Task
Components: Backend
Affects Versions: Impala 5.0.0
Reporter: Joe McDonnell
Tuple cache correctness verification does a fast check to see if the two files
are identical. If it determines that they are not identical, then it can
proceed to a slow check that corrects for order differences.
This fast check looks at the file sizes and if they are not the same, it
returns a not-OK status:
{noformat}
if (file1_length != file2_length || file1_length ==
TUPLE_TEXT_FILE_SIZE_ERROR) {
return Status(TErrorCode::TUPLE_CACHE_INCONSISTENCY,
Substitute("Size of file '$0' (size: $1) and '$2' (size: $3) are
different",
path_a + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file1_length,
path_b + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file2_length));
}{noformat}
Returning not-OK status actually causes the calling code to skip the slow check
that can give more detail about what is different. We should change this to set
*passed = false and let the slower check go forward so that it produces a more
interesting error message. It's also unclear whether the same rows in a
different order would always have the same size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)