bahram-cdt opened a new pull request, #62016:
URL: https://github.com/apache/airflow/pull/62016

   ## What
   
   Handle `CrawlerRunningException` in `GlueCrawlerOperator.execute()` instead 
of letting it fail the Airflow task.
   
   ## Why
   
   When `start_crawler()` or `update_crawler()` is called while the crawler is 
already running (e.g., from a retry, overlapping DAG run, or boto3 internal 
retry after a timeout), the AWS Glue API raises `CrawlerRunningException`. 
Currently this propagates as an unhandled `ClientError`, causing the Airflow 
task to fail even though the crawler run completes successfully.
   
   This is a common issue in production: the Glue console shows the crawler 
succeeded, but Airflow marks the task as failed and triggers alerts.
   
   ## What Changed
   
   
**`providers/amazon/src/airflow/providers/amazon/aws/operators/glue_crawler.py`**
   - Wrapped `update_crawler()` with try/except: catches 
`CrawlerRunningException` and logs a warning (skips the update since the 
crawler is busy).
   - Wrapped `start_crawler()` with try/except: catches 
`CrawlerRunningException` and logs a warning (waits for the existing run 
instead of failing).
   - All other `ClientError` codes are re-raised as before.
   - Added `from botocore.exceptions import ClientError` import.
   
   **`providers/amazon/tests/unit/amazon/aws/operators/test_glue_crawler.py`**
   - `test_execute_crawler_running_on_start`: verifies 
`CrawlerRunningException` on `start_crawler` is caught and the operator waits 
for the existing run.
   - `test_execute_crawler_running_on_update`: verifies 
`CrawlerRunningException` on `update_crawler` is caught and `start_crawler` is 
still called.
   - `test_execute_other_client_error_on_start_raises`: verifies 
non-`CrawlerRunningException` errors on `start_crawler` propagate.
   - `test_execute_other_client_error_on_update_raises`: verifies 
non-`CrawlerRunningException` errors on `update_crawler` propagate.
   
   ## How to Test
   
   ```python
   # Simulate CrawlerRunningException
   from botocore.exceptions import ClientError
   error = ClientError(
       error_response={"Error": {"Code": "CrawlerRunningException", "Message": 
"Already running"}},
       operation_name="StartCrawler",
   )
   # Previously: operator.execute() raises ClientError -> task fails
   # Now: operator catches it, logs warning, waits for existing run -> task 
succeeds
   ```
   
   ---
   
   ^ Add meaningful description above
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{pr_number}.breaking.rst`, in `providers/amazon/newsfragments`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to