[ 
https://issues.apache.org/jira/browse/IMPALA-12146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889825#comment-17889825
 ] 

ASF subversion and git services commented on IMPALA-12146:
----------------------------------------------------------

Commit 40bb93fc4beb3b216aca942a01352feb535a6cb8 in impala's branch 
refs/heads/master from Yida Wu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=40bb93fc4 ]

IMPALA-12146: Fix incorrect host memory reserved when the executor quits 
abnormally

Currently there is an issue where if an executor quits abnormally
while running a query, its reserved memory may still remain in the
coordinator's host stats.

The remote aggregated memory reserved uses all available remote
pool stats for calculation. The problem happens when the statestore
sends a topic update to update the pool stats. Although the
coordinator removes the remote stats from the pool for the terminated
executor during the update, in UpdateClusterAggregates(), it fails to
reset the corresponding aggregated memory reserved for that host
if all the remote stats for that host have been removed. This can
lead to stale memory reserved value remaining.

To fix this, added a logic to ensure that the stats of memory
reserved of that host are reset in the aggregated host stats when a
delete topic for the host is detected and the host no longer exists
in any remote pool stats.

Tests:
Passed exhaustive tests.
Added testcase AdmissionControllerTest::EraseHostStats.
Manually verified that the coordinator web ui correctly showed the
reserved memory after the crashed executor recovered and rejoined.

Change-Id: Ic6f6edd28c55904d63d0c494230ee2bf7a0f6cce
Reviewed-on: http://gerrit.cloudera.org:8080/21896
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>


> Memory reserved doesn't get updated if an executor backend gets abnormally 
> terminated
> -------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12146
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12146
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Abhishek Rawat
>            Assignee: Yida Wu
>            Priority: Critical
>             Fix For: Impala 4.5.0
>
>
> If an executor backend is abnormally terminated its memory reserved 
> accounting doesn't get updated and as a result the admission controller gets 
> an incorrect view of the reserved memory for that particular backend. The 
> side effect is that even if the abnormally terminated executor is restarted 
> and added to the cluster, its reserved memory is still incorrectly set to the 
> value before termination.
> Repro:
>  * Execute a long running query
>  * Kill one of the Impalads which is an executor only backend while the query 
> is running.
>  * Restart the executor backend which was abnormally terminated above
>  * On the web ui go to the /backends page and the 'Memory Reserved' for the 
> executor backend would be non zero
>  * Even if the session is closed the 'Memory Reserved' for the executor 
> backend remains non zero.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to