[ 
https://issues.apache.org/jira/browse/AIRFLOW-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535321#comment-16535321
 ] 

ASF subversion and git services commented on AIRFLOW-2706:
----------------------------------------------------------

Commit 0c5ebcbd1e1b26664061f2db889748f0085d02fe in incubator-airflow's branch 
refs/heads/master from [~cforster]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0c5ebcb ]

[AIRFLOW-2706] AWS Batch Operator should use top-level job state to determine 
status

Rather than inspecting the state of job attempts,
the operator should use the top-level job status
to determine the overall success or failure of the
task. This means the following cases are handled
correctly:

1. Any infrastructure failure that results in no
attempts being performed is now detected.
2. Any retry policy that AWS Batch will do is now
honored -- the job isn't marked FAILED until all
   attempts to retry have failed. Previously, the
first failed *attempt* would make the task as
   failed.

Closes #3567 from craigforster/master


> AWS Batch Operator doesn't detect failure if there were no job attempts
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-2706
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2706
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: aws
>            Reporter: Craig Forster
>            Assignee: Craig Forster
>            Priority: Major
>             Fix For: 2.0.0
>
>
> During initial deployment testing of our AWS Batch environment using Airflow 
> to co-ordinate, we had a few false starts while we fixed IAM roles.  However, 
> these failed jobs weren't detected as failed by Airflow.
> I believe the issue lies in _check_success_task; the failure check loops over 
> the attempts array, but in this case there are no attempts to check.
> Logs:
> {noformat}
> {awsbatch_operator.py:150} INFO - AWS Batch stopped, check status: 
> {
>   "ResponseMetadata": {
>     "RequestId": "51084897-7d90-11e8-be75-7b511f9b010d",
>     "HTTPStatusCode": 200,
>     "HTTPHeaders": {
>       "date": "Mon, 02 Jul 2018 00:39:02 GMT",
>       "content-type": "application/json",
>       "content-length": "1142",
>       "connection": "keep-alive",
>       "x-amzn-requestid": "51084897-7d90-11e8-be75-7b511f9b010d",
>       "x-amz-apigw-id": "JX8V_HOyPHcF5KA=",
>       "x-amzn-trace-id": "Root=1-5b397426-058a6d1ce4d7569273c05bd4"
>     },
>     "RetryAttempts": 0
>   },
>   "jobs": [
>     {
>       "jobName": "snip-20180317",
>       "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c",
>       "jobQueue":
>         "arn:aws:batch:us-west-2:snip:job-queue/snip-829f351459741d3",
>       "status": "FAILED",
>       "attempts": [],
>       "statusReason": "Role is not valid",
>       "createdAt": 1530491934164,
>       "retryStrategy": { "attempts": 1 },
>       "dependsOn": [],
>       "jobDefinition":
>         "arn:aws:batch:us-west-2:snip:job-definition/snip-job-definition:4",
>       "parameters": {},
>       "container": {
>         "image":
>           "snip.dkr.ecr.eu-central-1.amazonaws.com/snip:latest",
>         "vcpus": 1,
>         "memory": 2048,
>         "command": [],
>         "jobRoleArn":
>           
> "arn:aws:iam::snip:instance-profile/common-instance-profile-us2-sandbox",
>         "volumes": [],
>         "environment": [
>           { SNIP }
>         ],
>         "mountPoints": [],
>         "ulimits": [],
>         "privileged": True
>       }
>     }
>   ]
> }
> {awsbatch_operator.py:110} INFO - AWS Batch Job has been successfully 
> executed: 
> {
>   "ResponseMetadata": {
>     "RequestId": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
>     "HTTPStatusCode": 200,
>     "HTTPHeaders": {
>       "date": "Mon, 02 Jul 2018 00:38:54 GMT",
>       "content-type": "application/json",
>       "content-length": "111",
>       "connection": "keep-alive",
>       "x-amzn-requestid": "4c255dd7-7d90-11e8-988b-c9ea0b25c469",
>       "x-amz-apigw-id": "JX8UtH6VvHcFcVg=",
>       "x-amzn-trace-id": "Root=1-5b39741e-577ea13c82751664daac335e"
>     },
>     "RetryAttempts": 0
>   },
>   "jobName": "snip-20180317",
>   "jobId": "2ea0def8-1e7f-4a5c-bd1e-3f0a3acc035c"
> }
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to