LarsAlmgren opened a new issue, #27556:
URL: https://github.com/apache/airflow/issues/27556

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   We are using tags on resource in AWS. When setting tags when using 
`GlueCrawlerOperator` it works the first time, when Airflow creates the 
crawler. However on subsequent runs in fails because `boto3.get_crawler()` does 
not return the Tags. Hence we get the error below.
   ```
   [2022-11-08, 14:48:49 ] {taskinstance.py:1774} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue_crawler.py",
 line 80, in execute
       self.hook.update_crawler(**self.config)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py",
 line 86, in update_crawler
       key: value for key, value in crawler_kwargs.items() if 
current_crawler[key] != crawler_kwargs[key]
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py",
 line 86, in <dictcomp>
       key: value for key, value in crawler_kwargs.items() if 
current_crawler[key] != crawler_kwargs[key]
   KeyError: 'Tags'
   ```
   
   ### What you think should happen instead
   
   Ignore tags when checking if the crawler should be updated.
   
   ### How to reproduce
   
   Use `GlueCrawlerOperator` with Tags like below and trigger the task multiple 
times. It will fail the second time around.
   ```
   GlueCrawlerOperator(
           dag=dag,
           task_id="the_task_id",
           config={
               "Name": "name_of_the_crwaler",
               "Role": "some-role",
               "DatabaseName": "some_database",
               "Targets": {"S3Targets": [{"Path": "s3://..."}]},
               "TablePrefix": "a_table_prefix",
               "RecrawlPolicy": {
                   "RecrawlBehavior": "CRAWL_EVERYTHING"
               },
               "SchemaChangePolicy": {
                   "UpdateBehavior": "UPDATE_IN_DATABASE",
                   "DeleteBehavior": "DELETE_FROM_DATABASE"
               },
               "Tags": {
                   "TheTag": "value-of-my-tag"
               }
           }
   ```
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-cncf-kubernetes==3.0.0
   apache-airflow-providers-google==6.7.0
   apache-airflow-providers-amazon==3.2.0
   apache-airflow-providers-slack==4.2.3
   apache-airflow-providers-http==2.1.2
   apache-airflow-providers-mysql==2.2.3
   apache-airflow-providers-ssh==2.4.3
   apache-airflow-providers-jdbc==2.1.3
   ```
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   Airflow v2.2.5
   Self-hosted Airflow in Kubernetes.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to