bahram-cdt commented on PR #62016:
URL: https://github.com/apache/airflow/pull/62016#issuecomment-3909687236

   > It kinds of makes sense to receive this exception no? You're trying to 
update or a start a new job but it fails to do so because there is already one 
running. What do you think?
   
   Good point! The exception is indeed telling us something meaningful — but 
I'd argue the correct response is to wait for the existing run, not to fail the 
task.
   
   The most common cause in production is **retry-induced race conditions**: 
boto3's built-in retry fires `start_crawler()` a second time after a network 
timeout on the first (successful) call. The user didn't do anything wrong, and 
the crawler will complete successfully, but the Airflow task fails and triggers 
false alerts.
   
   Since the operator already supports `wait_for_completion`, the natural 
behavior when a crawler is already running is to wait for it — the end state is 
identical to starting a fresh run and waiting.
   
   For `update_crawler`, I agree the case is slightly weaker (we're skipping a 
config update), but the config rarely changes between runs, and the next 
successful run will pick it up. Failing the whole task seems disproportionate.
   
   An alternative design: we could add a `fail_on_already_running: bool = 
False` parameter to make this opt-in, if the team prefers a 
non-breaking-default approach. Happy to adjust!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to