o-nikolas commented on code in PR #53990:
URL: https://github.com/apache/airflow/pull/53990#discussion_r2246178113
##########
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py:
##########
@@ -425,17 +426,27 @@ def process_queue(self, queue_url: str):
"Successful Lambda invocation for task %s received
from SQS queue.", task_key
)
else:
- # In this case the Lambda likely started but failed at run
time since we got a non-zero
- # return code. We could consider retrying these tasks
within the executor, because this _likely_
- # means the Airflow task did not run to completion,
however we can't be sure (maybe the
- # lambda runtime code has a bug and is returning a
non-zero when it actually passed?). So
- # perhaps not retrying is the safest option.
self.fail(task_key)
- self.log.error(
- "Lambda invocation for task: %s has failed to run with
return code %s",
- task_key,
- return_code,
- )
+ if queue_url == self.dlq_url and return_code == None:
+ # DLQ failure: AWS Lambda service could not complete
the invocation after retries.
+ # This indicates a Lambda-level failure (timeout,
memory limit, crash, etc.)
+ # where the function was unable to successfully
execute to return a result.
+ self.log.error(
+ "Lambda invocation for task: %s failed at the
service level (from DLQ). Command: %s",
Review Comment:
The command is pretty large in the AF3 case, so I'm not sure we should log
it (it doesn't really have helpful information either). I also think you have
some very helpful stuff in the comment that might be nice to include in the
error message itself to give the user some context and pointers?
What about:
```suggestion
"DLQ message received: Lambda invocation for
task: %s was unable to successfully execute. This likely indicates a
Lambda-level failure (timeout, memory limit, crash, etc.).",
```
##########
providers/amazon/src/airflow/providers/amazon/aws/executors/aws_lambda/lambda_executor.py:
##########
@@ -425,17 +426,27 @@ def process_queue(self, queue_url: str):
"Successful Lambda invocation for task %s received
from SQS queue.", task_key
)
else:
- # In this case the Lambda likely started but failed at run
time since we got a non-zero
- # return code. We could consider retrying these tasks
within the executor, because this _likely_
- # means the Airflow task did not run to completion,
however we can't be sure (maybe the
- # lambda runtime code has a bug and is returning a
non-zero when it actually passed?). So
- # perhaps not retrying is the safest option.
self.fail(task_key)
- self.log.error(
- "Lambda invocation for task: %s has failed to run with
return code %s",
- task_key,
- return_code,
- )
+ if queue_url == self.dlq_url and return_code == None:
+ # DLQ failure: AWS Lambda service could not complete
the invocation after retries.
+ # This indicates a Lambda-level failure (timeout,
memory limit, crash, etc.)
+ # where the function was unable to successfully
execute to return a result.
+ self.log.error(
+ "Lambda invocation for task: %s failed at the
service level (from DLQ). Command: %s",
+ task_key,
+ command,
+ )
+ else:
+ # In this case the Lambda likely started but failed at
run time since we got a non-zero
+ # return code. We could consider retrying these tasks
within the executor, because this _likely_
+ # means the Airflow task did not run to completion,
however we can't be sure (maybe the
+ # lambda runtime code has a bug and is returning a
non-zero when it actually passed?). So
+ # perhaps not retrying is the safest option.
+ self.log.error(
+ "Lambda invocation for task: %s has failed to run
with return code %s",
Review Comment:
We should also probably differentiate this one a bit more as well.
```suggestion
"Lambda invocation for task: %s completed but
the underlying Airflow task has returned a non-zero exit code %s",
```
I'm also wondering if this branch should be log.debug? Since it's very
common for airlfow tasks to fail for a myriad of reasons, and we don't
necessarily need a error log in the scheduler logs for each one of those.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]