[
https://issues.apache.org/jira/browse/SLIDER-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894919#comment-15894919
]
Gour Saha commented on SLIDER-1209:
-----------------------------------
Thanks for reviewing [~billie.rinaldi]. I tested it manually in my cluster and
looks ok for few cases. I did not get a chance to simulate all the scenarios
covering all the enum values.
One of the apps which was gracefully stopped via the stop command has the
following exitReason in diagnostics -
{code}
{
"finalStatus": "SUCCEEDED",
"finalMessage": "stop command issued",
"exitReason": "STOP_COMMAND_ISSUED",
"containers": [
{
"containerId": "container_e3378_1488324757330_0011_01_000002",
"component": "LLAP",
"state": 4,
"exitCode": 0,
"diagnostics": "Application stop triggered",
"createTime": 1488568441199,
"startTime": 1488568441272,
"completionTime": 1488568686173,
"host": "host5.example.com",
"hostURL": "http://host5.example.com:8042",
"logLink":
"http://host7.example.com:19888/jobhistory/logs/host5.example.com:45454/container_e3378_1488324757330_0011_01_000002/ctx/root"
},
.
.
}
{code}
Another one where I simulated a failure (by manually killing the app
containers) where the app ultimately dies has following exitReason in
diagnostics -
{code}
{
"finalStatus": "FAILED",
"finalMessage": "Unstable Application Instance : - failed with component LLAP
failed 'recently' 2 times (2 in startup); threshold is 1 - last failure:
Failure container_e3378_1488324757330_0009_01_000002 on host host6.example.com
(0):
http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root",
"exitReason": "SLIDER_AM_ERROR",
"containers": [
{
"containerId": "container_e3378_1488324757330_0009_01_000007",
"component": "LLAP",
"state": 4,
"exitCode": 0,
"createTime": 1488556767038,
"startTime": 1488556767113,
"completionTime": 1488556818069,
"host": "host9.example.com",
"hostURL": "http://host9.example.com:8042",
"logLink":
"http://host7.example.com:19888/jobhistory/logs/host9.example.com:45454/container_e3378_1488324757330_0009_01_000007/ctx/root"
},
{
"containerId": "container_e3378_1488324757330_0009_01_000002",
"component": "LLAP",
"state": 4,
"exitCode": 0,
"createTime": 1488556767048,
"startTime": 1488556767244,
"completionTime": 1488556819070,
"host": "host6.example.com",
"hostURL": "http://host6.example.com:8042",
"logLink":
"http://host7.example.com:19888/jobhistory/logs/host6.example.com:45454/container_e3378_1488324757330_0009_01_000002/ctx/root"
}
],
"recentFailedContainers": [
"container_e3378_1488324757330_0009_01_000007",
"container_e3378_1488324757330_0009_01_000002"
]
}
{code}
I am trying to add some tests for this patch now.
> Provide information on whether a slider app was killed / stopped via a request
> ------------------------------------------------------------------------------
>
> Key: SLIDER-1209
> URL: https://issues.apache.org/jira/browse/SLIDER-1209
> Project: Slider
> Issue Type: Sub-task
> Components: appmaster, client
> Reporter: Siddharth Seth
> Assignee: Gour Saha
> Fix For: Slider 1.0.0
>
> Attachments: SLIDER-1209.01.patch
>
>
> I am adding a new enum SliderExitReason with the high level reason for an
> application failure.
> For most of the cases it is difficult to decipher if the Slider app failed
> due to an application error. This gap can be bridged a little better when we
> get to SLIDER-1208.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)