andormarkus commented on pull request #16571:
URL: https://github.com/apache/airflow/pull/16571#issuecomment-890095899
Hi @ferruzzi,
I would like to describe a special problem which are having problem when we
are create EKS managed nodegroups with boto3.
We got 3 environment on 3 separate AWS account. By AWS design, AZ are
randomly assigned. If an instance type is available on account A in AZ 1 it
might be not available in account B an AZ1. We are running into this issue:
```bash
Your requested instance type (m5ad.4xlarge) is not supported in your
requested Availability Zone (eu-central-1c). Please retry your request by not
specifying an Availability Zone or choosing eu-central-1a, eu-central-1b.
```
In this case, node group creation will fail with `CREATE_FAILED` error and
the node group be available on EKS. When Airflow retry come, the second job
will fail with the following error:
```bash
botocore.errorfactory.ResourceInUseException: An error occurred
(ResourceInUseException) when calling the CreateNodegroup operation: NodeGroup
already exists with name [my_node] and cluster name [my_cluster]
```
It would be great is there would be an option in this integration if create
jobs fails with `CREATE_FAILED` than it would delete the failed node group.
I hope I was clear, if not feel free to ask any question.
Thanks,
Andor
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]