[ 
https://issues.apache.org/jira/browse/IMPALA-11979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048614#comment-18048614
 ] 

ASF subversion and git services commented on IMPALA-11979:
----------------------------------------------------------

Commit 411309acf4d3f326f05dfa04749a0ec0e2ccc801 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=411309acf ]

IMPALA-11979: Add 'scheduling_seed' to customize consistent scheduling behavior

This adds a startup parameter --scheduling_seed which is a string
identifier of an executor within an executor group. It should
be unique within an executor group, but it could be reused
across executors groups on the same system. This is used for
scan range scheduling for remote filesystems, so this can be
used to make the scheduling deterministic across multiple
executor groups or when an executor group gets restarted on
machines with different IP addresses.

For example, the 3rd executor in an executor group of size 8
might use "executor_3_of_8" for its scheduling seed. If there
are multiple executors groups of size 8, the 3rd in each can
use that scheduling seed.

Testing:
 - Ran core job

Change-Id: Ie01c7d119cc88766082dbfca3ff685354d01f71f
Reviewed-on: http://gerrit.cloudera.org:8080/22214
Reviewed-by: Joe McDonnell <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Support option for consistent scan range scheduling across executor 
> invocations with changing IP address
> --------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11979
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11979
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.2.0
>            Reporter: David Rorke
>            Priority: Major
>
> The scheduler currently assigns scan ranges to executor hosts using a hash 
> ring with the executor IP used as the hash key.  This results in varying scan 
> range assignments across multiple instantiations of an executor if the 
> executor IP changes.  This makes it more difficult to get repeatable 
> scheduling behavior when debugging or performance testing in an environment 
> where executor IPs can be assigned dynamically.
> We should support a mode where the scheduler's executor hashing uses a fixed 
> executor identifier that we specify (unique within an executor group).  This 
> could also be useful for potential future optimizations like cache prewarming
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to