Joe McDonnell created IMPALA-8677:
-------------------------------------
Summary: Removing an unused node does not leave consistent remote
scheduling unchanged
Key: IMPALA-8677
URL: https://issues.apache.org/jira/browse/IMPALA-8677
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 3.2.0
Reporter: Joe McDonnell
When working on IMPALA-8630, I discovered that
SchedulerTest::RemoteExecutorCandidateConsistency works mostly by happenstance.
The root of the issue is that in Scheduler::GetRemotExecutorCandidates() we
want to avoid returning duplicates and put all the IpAddrs in a set:
{code:java}
set<IpAddr> distinct_backends;
...
distinct_backends.insert(*executor_addr);
...
for (const IpAddr& addr : distinct_backends) {
remote_executor_candidates->push_back(addr);
}{code}
This sorts the IpAddrs, and the remote_executor_candidates does not return
elements in the order in which they are encountered.
Suppose that we are running with num_remote_executor_candidates=2 and random
replicas is false. There is exactly one file. GetRemoteExecutorCandidates()
returns these executor candidates (IpAddrs):
{192.168.1.2, 192.168.1.3}
The first entry is chosen because it is first. Nothing was scheduled on
192.168.1.3, but removing it may change the scheduling outcome. This is because
of the sort. Suppose 192.168.1.3 is gone, but the next closest executor is
192.168.1.1 (or some node less than 192.168.1.2). Even though it is farther in
the context of the hashring, GetRemoteExecutorCandidates() would return:
{192.168.1.1, 192.168.1.2}
and the first entry would be chosen.
To eliminate this inconsistency, it might be useful to retain the order in
which elements match via the hashring.
In terms of impact, this would increase the number of files that would
potentially change scheduling when a node leaves. It might have unnecessary
changes. If using random replica set to true, it doesn't matter. It is unclear
how much this would impact otherwise.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]