bahram-cdt commented on PR #62016: URL: https://github.com/apache/airflow/pull/62016#issuecomment-3909687236
> It kinds of makes sense to receive this exception no? You're trying to update or a start a new job but it fails to do so because there is already one running. What do you think? Good point! The exception is indeed telling us something meaningful — but I'd argue the correct response is to wait for the existing run, not to fail the task. The most common cause in production is **retry-induced race conditions**: boto3's built-in retry fires `start_crawler()` a second time after a network timeout on the first (successful) call. The user didn't do anything wrong, and the crawler will complete successfully, but the Airflow task fails and triggers false alerts. Since the operator already supports `wait_for_completion`, the natural behavior when a crawler is already running is to wait for it — the end state is identical to starting a fresh run and waiting. For `update_crawler`, I agree the case is slightly weaker (we're skipping a config update), but the config rarely changes between runs, and the next successful run will pick it up. Failing the whole task seems disproportionate. An alternative design: we could add a `fail_on_already_running: bool = False` parameter to make this opt-in, if the team prefers a non-breaking-default approach. Happy to adjust! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
