Gunther Hagleitner created TEZ-3405:
---------------------------------------
Summary: Support ability for AM to kill itself if there is no
client heartbeating to it
Key: TEZ-3405
URL: https://issues.apache.org/jira/browse/TEZ-3405
Project: Apache Tez
Issue Type: Bug
Reporter: Gunther Hagleitner
Priority: Critical
HiveServer2 optionally maintains a pool of AMs in either Tez or LLAP mode. This
is done to amortize the cost of launching a Tez session.
We also try in a shutdown hook to kill all these AMs when HS2 goes down.
However, there are cases where HS2 doesn't get the chance to kill these AMs
before it goes away. As a result these zombie AMs hang around until the timeout
kicks in.
The trouble with the timeout is that we have to set it fairly high. Otherwise
the benefit of having pre-launched AMs obviously goes away (in a lightly loaded
cluster).
So, if people kill/restart HS2 they often times run into situations where the
cluster/queue doesn't have any more capacity for AMs. They either have to
manually kill the zombies or wait.
The request is therefore for Tez to maintain a heartbeat to the client. If the
client goes away the AM should exit. That way we can keep the AMs alive for a
long time regardless of activity and at the same time don't have to worry about
them if HS2 goes down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)