[jira] [Commented] (IMPALA-13548) Add a mode to schedule scan ranges in order of modification time

ASF subversion and git services (Jira) Thu, 25 Sep 2025 16:34:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022923#comment-18022923
 ]


ASF subversion and git services commented on IMPALA-13548:
----------------------------------------------------------

Commit 775f73f03ea59401ca2752383182185599b9777d in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=775f73f03 ]

IMPALA-14462: Fix tie-breaking for sorting scan ranges oldest to newest

TestTupleCacheFullCluster.test_scan_range_distributed is flaky on s3
builds. The addition of a single file is changing scheduling significantly
even with scan ranges sorted oldest to newest. This is because modification
times on S3 have a granularity of one second. Multiple files have the
same modification time, and the fix for IMPALA-13548 did not properly
break ties for sorting.

This adds logic to break ties for files with the same modification
time. It compares the path (absolute path or relative path + partition)
as well as the offset within the file. These should be enough to break
all conceivable ties, as it is not possible to have two scan ranges with
the same file at the same offset. In debug builds, this does additional
validation to make sure that when a != b, comp(a, b) != comp(b, a).

The test requires that adding a single file to the table changes exactly
one cache key. If that final file has the same modification time as
an existing file, scheduling may still mix up the files and change more
than one cache key, even with tie-breaking. This adds a sleep just before
generating the final file to guarantee that it gets a newer modification
time.

Testing:
 - Ran TestTupleCacheFullCluster.test_scan_range_distributed for 15
   iterations on S3

Change-Id: I3f2e40d3f975ee370c659939da0374675a28cd38
Reviewed-on: http://gerrit.cloudera.org:8080/23458
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Reviewed-by: Riza Suminto <riza.sumi...@cloudera.com>


> Add a mode to schedule scan ranges in order of modification time
> ----------------------------------------------------------------
>
>                 Key: IMPALA-13548
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13548
>             Project: IMPALA
>          Issue Type: Task
>          Components: Backend
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> When a file gets added to a table, the scheduler can have some instability in 
> how it assigns scan ranges. The scheduler is walking through the scan ranges 
> and handing them out in a single pass. If the new scan range is at the end of 
> the list, then there is minimal disruption. Every assignment would be the 
> same except the node that got the new scan range. However, if the new scan 
> range is early in the list, it's assignment can change subsequent assignments 
> of other scan ranges. This can cascade and result in an entirely different 
> assignment.
> This is bad for the tuple cache, because it makes it difficult to get cache 
> hits for a table that is ingesting data.
> If the scan ranges were ordered by modification time (ascending), then new 
> scan ranges for an ingest would be at the end of the list and cause minimal 
> disruption.
> We should add a mode that does this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13548) Add a mode to schedule scan ranges in order of modification time

Reply via email to