[ 
https://issues.apache.org/jira/browse/IMPALA-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854795#comment-16854795
 ] 

ASF subversion and git services commented on IMPALA-8469:
---------------------------------------------------------

Commit 6b3e5fe426a7cd8b13c18a54fe6c2726ab8667d8 in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6b3e5fe ]

IMPALA-8460: Simplify cluster membership management

This change adds a class to track cluster membership called
ClusterMembershipMgr. It replaces the logic that was partially
duplicated between the ImpalaServer and the Coordinator and makes sure
that the local backend descriptor is consistent (IMPALA-8469).

The ClusterMembershipMgr maintains a view of the cluster membership and
incorporates incoming updates from the statestore. It also registers the
local backend with the statestore after startup. Clients can obtain a
consistent, immutable snapshot of the current cluster membership from
the ClusterMembershipMgr. Additionally, callbacks can be registered to
receive notifications of cluster membership changes. The ImpalaServer
and Frontend use this mechanism.

This change also generalizes the fix for IMPALA-7665: updates from the
statestore to the cluster membership topic are only made visible to the
rest of the local server after a post-recovery grace period has elapsed.
As part of this the flag
'failed_backends_query_cancellation_grace_period_ms' is replaced with
'statestore_subscriber_recovery_grace_period_ms'. To tell the initial
startup from post-recovery, a new metric
'statestore-subscriber.num-connection-failures' is exposed by the
daemon, which tracks the total number of connection failures to the
statestore over the lifetime process lifetime.

This change also unifies the naming of executor-related classes, in
particular it renames "BackendConfig" to "ExecutorGroup". In
anticipation of a subsequent change (IMPALA-8484), it adds maps to store
multiple executor groups.

This change also disables the generation of default operators from the
thrift files and instead adds explicit implementations for the ones that
we rely on. This forces us to explicitly specify comparators when
manipulating containers of thrift structs and will help prevent
accidental bugs.

Testing: This change adds a backend unit test for the new cluster
membership manager. The observable behavior of Impala does not change,
and the existing scheduler unit test and end to end tests should make
sure of that.

Change-Id: Ib3cf9a8bb060d0c6e9ec8868b7b21ce01f8740a3
Reviewed-on: http://gerrit.cloudera.org:8080/13207
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Admit memory not set in backend descriptor for coordinator-only nodes
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-8469
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8469
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Ian Buss
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: admission-control, resource-management
>             Fix For: Impala 3.3.0
>
>
> When configuring admission control with dedicated coordinator daemons, 
> queries in pools with memory limits fail with the admission rejections like 
> the following:
> {noformat}
> Rejected query from pool root.default: request memory needed 3.00 GB per node 
> is greater than memory available for admission 0 of coord1.example.com:22000. 
> Use the MEM_LIMIT query option to indicate how much memory is required per 
> node.{noformat}
> Tracing this in the code leads us to line 576 of {{admission-controller.cc}} 
> and therefore to suspect that the local {{TBackendDescriptor}} 
> ({{local_backend_descriptor_}}) for the coordinator node in {{scheduler.cc}} 
> never has {{admit_mem_limit}} set, and thus ends up with the default value of 
> 0.
> The issue goes away if NO_SPECIALIZATION is used instead of COORDINATOR_ONLY.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to