[ 
https://issues.apache.org/jira/browse/IMPALA-13894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17942326#comment-17942326
 ] 

ASF subversion and git services commented on IMPALA-13894:
----------------------------------------------------------

Commit a877cde76df7f5435a72fc7ea4a870ca7bf28fb5 in impala's branch 
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a877cde76 ]

IMPALA-13894: Allow slow check in tuple cache correctness verification when 
file sizes differ

Currently, tuple cache correctness verification does a fast check,
and returns an error if file sizes are different.

This patch allows a slow check when file sizes differ. Because the
slow check may provide a clearer error message and help prevent
false mismatches when identical rows appear in a different order,
which may lead to size differences. Updated TupleTextFileUtilTest
for this change.

Also fixes argument order in VerifyRows() to correct misleading
log output. The previous order was incorrect, causing the log to
show the wrong one as the reference file.

Tests:
Passed core tests.
Manually verified that when file sizes differ, the query proceeds
to the slow check after this change.

Change-Id: I02e031410dac32d9df746201b156783a8b7d9a1a
Reviewed-on: http://gerrit.cloudera.org:8080/22661
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Kurt Deschler <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Tuple cache correctness verification should proceed past file size differences
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-13894
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13894
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Yida Wu
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> Tuple cache correctness verification does a fast check to see if the two 
> files are identical. If it determines that they are not identical, then it 
> can proceed to a slow check that corrects for order differences.
> This fast check looks at the file sizes and if they are not the same, it 
> returns a not-OK status:
> {noformat}
>   if (file1_length != file2_length || file1_length == 
> TUPLE_TEXT_FILE_SIZE_ERROR) {
>     return Status(TErrorCode::TUPLE_CACHE_INCONSISTENCY,
>         Substitute("Size of file '$0' (size: $1) and '$2' (size: $3) are 
> different",
>             path_a + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file1_length,
>             path_b + DEBUG_TUPLE_CACHE_BAD_POSTFIX, file2_length));
>   }{noformat}
> Returning not-OK status actually causes the calling code to skip the slow 
> check that can give more detail about what is different. We should change 
> this to set *passed = false and let the slower check go forward so that it 
> produces a more interesting error message. It's also unclear whether the same 
> rows in a different order would always have the same size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to