LarsAlmgren opened a new issue, #27556:
URL: https://github.com/apache/airflow/issues/27556
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
We are using tags on resource in AWS. When setting tags when using
`GlueCrawlerOperator` it works the first time, when Airflow creates the
crawler. However on subsequent runs in fails because `boto3.get_crawler()` does
not return the Tags. Hence we get the error below.
```
[2022-11-08, 14:48:49 ] {taskinstance.py:1774} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue_crawler.py",
line 80, in execute
self.hook.update_crawler(**self.config)
File
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py",
line 86, in update_crawler
key: value for key, value in crawler_kwargs.items() if
current_crawler[key] != crawler_kwargs[key]
File
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py",
line 86, in <dictcomp>
key: value for key, value in crawler_kwargs.items() if
current_crawler[key] != crawler_kwargs[key]
KeyError: 'Tags'
```
### What you think should happen instead
Ignore tags when checking if the crawler should be updated.
### How to reproduce
Use `GlueCrawlerOperator` with Tags like below and trigger the task multiple
times. It will fail the second time around.
```
GlueCrawlerOperator(
dag=dag,
task_id="the_task_id",
config={
"Name": "name_of_the_crwaler",
"Role": "some-role",
"DatabaseName": "some_database",
"Targets": {"S3Targets": [{"Path": "s3://..."}]},
"TablePrefix": "a_table_prefix",
"RecrawlPolicy": {
"RecrawlBehavior": "CRAWL_EVERYTHING"
},
"SchemaChangePolicy": {
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "DELETE_FROM_DATABASE"
},
"Tags": {
"TheTag": "value-of-my-tag"
}
}
```
### Operating System
Ubuntu
### Versions of Apache Airflow Providers
```
apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-google==6.7.0
apache-airflow-providers-amazon==3.2.0
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-http==2.1.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-ssh==2.4.3
apache-airflow-providers-jdbc==2.1.3
```
### Deployment
Other 3rd-party Helm chart
### Deployment details
Airflow v2.2.5
Self-hosted Airflow in Kubernetes.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]