[ 
https://issues.apache.org/jira/browse/IMPALA-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028159#comment-18028159
 ] 

ASF subversion and git services commented on IMPALA-14473:
----------------------------------------------------------

Commit 762fe0a4f5c9089e8a75fd992ab39c85943db562 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=762fe0a4f ]

IMPALA-14473: Fix absolute path logic for sorting scan ranges oldest to newest

When IMPALA-14462 added tie-breaking logic to
ScanRangeOldestToNewestComparator, it relied on absolute path
being unset if the relative path is set. However, the code
always sets absolute path and uses an empty string to indicate
whether it is set. This caused the tie-breaking logic to see
two unrelated scan ranges as equal, triggering a DCHECK when
running query_test/test_tuple_cache_tpc_queries.py.

The fix is to rearrange the logic to check whether the relative
path is not empty rather than checking whether the absolute
path is set.

Testing:
 - Ran query_test/test_tuple_cache_tpc_queries.py
 - Ran custom_cluster/test_tuple_cache.py

Change-Id: I449308f4a0efdca7fc238e3dda24985a2931dd37
Reviewed-on: http://gerrit.cloudera.org:8080/23495
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Yida Wu <[email protected]>
Reviewed-by: Joe McDonnell <[email protected]>


> Tuple caching verification crashes with DCHECK on "Duplicate scan range when 
> sorting"
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14473
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14473
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Laszlo Gaal
>            Assignee: Joe McDonnell
>            Priority: Blocker
>
> Tuple caching verification build using the following settings:
> {code}
> TUPLE_CACHE_DIR=/data/jenkins/workspace/tmp
> TUPLE_CACHE_CAPACITY=20GB
> TUPLE_CACHE_DEBUG_DUMP_DIR=/data/jenkins/workspace/tmp{code}
> fails with a DCHECK and a core dump:
> {code}
> F20250930 20:58:39.068352 2476951 scheduler.cc:219] 
> 424af8fb8f6cc529:902f7f6200000000] Check failed: false Duplicate scan range 
> when sorting. Split 1: 
> THdfsFileSplit(relative_path=3a4c3e4dc68b0f78-c3a9f15e00000003_1382943121_data.0.parq,
>  offset=0, length=140407, partition_id=1372, file_length=140407, 
> file_compression=NONE, mtime=1759282690197, is_erasure_coded=0, 
> partition_path_hash=1359082287, absolute_path=, is_encrypted=0) Split 2: 
> THdfsFileSplit(relative_path=3a4c3e4dc68b0f78-c3a9f15e00000004_1978866414_data.0.parq,
>  offset=0, length=152474, partition_id=1366, file_length=152474, 
> file_compression=NONE, mtime=1759282690197, is_erasure_coded=0, 
> partition_path_hash=1359082260, absolute_path=, is_encrypted=0)
> {code}.
> This crashes an impalad, which never recovers during EE_TEST, generating a 
> few hundred false test failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to