Joe McDonnell created IMPALA-13548:
--------------------------------------

             Summary: Add a mode to schedule scan ranges in order of 
modification time
                 Key: IMPALA-13548
                 URL: https://issues.apache.org/jira/browse/IMPALA-13548
             Project: IMPALA
          Issue Type: Task
          Components: Backend
    Affects Versions: Impala 4.5.0
            Reporter: Joe McDonnell


When a file gets added to a table, the scheduler can have some instability in 
how it assigns scan ranges. The scheduler is walking through the scan ranges 
and handing them out in a single pass. If the new scan range is at the end of 
the list, then there is minimal disruption. Every assignment would be the same 
except the node that got the new scan range. However, if the new scan range is 
early in the list, it's assignment can change subsequent assignments of other 
scan ranges. This can cascade and result in an entirely different assignment.

This is bad for the tuple cache, because it makes it difficult to get cache 
hits for a table that is ingesting data.

If the scan ranges were ordered by modification time (ascending), then new scan 
ranges for an ingest would be at the end of the list and cause minimal 
disruption.

We should add a mode that does this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to