Lin Wen created HAWQ-979:
----------------------------
Summary: Resource Broker Should Reconnect Hadoop Yarn When Failed
to Get Cluster Report
Key: HAWQ-979
URL: https://issues.apache.org/jira/browse/HAWQ-979
Project: Apache HAWQ
Issue Type: Bug
Components: Resource Manager
Reporter: Lin Wen
Assignee: Lei Chang
While HAWQ with yarn mode is running, sometimes the heartbeat thread of libyarn
maybe fail(e.g. YARN RM restarts) and quit,
2016-08-03 18:45:27.913838
PDT,,,p34645,th-1290610400,,,,0,con4,,seg-10000,,,,,"WARNING","01000","YARN
mode resource broker failed to get YARN queue report of queue default.
LibYarnClient::getQueueInfo, Catch the Exception:LibYarnClient::libyarn AM
heartbeat thread has stopped.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1840,
resource broker process should re-register HAWQ to YARN in this case, but
actually not.
The reason is:
In function handleRM2RB_GetClusterReport(), when RB2YARN_getQueueReport()
failed, function sendRBGetClusterReportErrorData() is called, but
sendRBGetClusterReportErrorData() returns OK(should return RESBROK_ERROR_GRM)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)