nsuthar-lumiq opened a new issue, #26196:
URL: https://github.com/apache/airflow/issues/26196
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
5.0.0
### Apache Airflow version
>= 2.3.2
### Operating System
Linux
### Deployment
Docker-Compose
### Deployment details
docker-compose
### What happened
The method **job_completion** of **GlueJobHook** call **print_job_logs** in
finally clauses that will be never call when job get completed or failed since
when job get completed it return value by using return statement (that will not
execute finally block) and similarly in case of failure it will raise exception
that will also not execute the finally block.
Due to that airflow does not show Glue job logs from CloudWatch.
```
def job_completion(self, job_name: str, run_id: str, verbose: bool = False)
-> Dict[str, str]:
"""
Waits until Glue job with job_name completes or
fails and return final state if finished.
Raises AirflowException when the job failed
:param job_name: unique job name per AWS account
:param run_id: The job-run ID of the predecessor job run
:param verbose: If True, more Glue Job Run logs show in the Airflow
Task Logs. (default: False)
:return: Dict of JobRunState and JobRunId
"""
failed_states = ['FAILED', 'TIMEOUT']
finished_states = ['SUCCEEDED', 'STOPPED']
next_log_token = None
job_failed = False
while True:
try:
job_run_state = self.get_job_state(job_name, run_id)
if job_run_state in finished_states:
self.log.info('Exiting Job %s Run State: %s', run_id,
job_run_state)
return {'JobRunState': job_run_state, 'JobRunId': run_id}
if job_run_state in failed_states:
job_failed = True
job_error_message = f'Exiting Job {run_id} Run State:
{job_run_state}'
self.log.info(job_error_message)
raise AirflowException(job_error_message)
else:
self.log.info(
'Polling for AWS Glue Job %s current run state with
status %s',
job_name,
job_run_state,
)
time.sleep(self.JOB_POLL_INTERVAL)
finally:
if verbose:
next_log_token = self.print_job_logs(
job_name=job_name,
run_id=run_id,
job_failed=job_failed,
next_token=next_log_token,
)
```
### What you think should happen instead
It should print the log in all cases, (failure or success) when `verbose
=True`.
### How to reproduce
Use latest version of amazon providers 5.0.0 and create a Airflow task for
any Glue job.
Make sure to pass `verbose = True` in **GlueJobOperator** as below:
```
job_task = GlueJobOperator(task_id ="testJob",
job_name= "<Glue job name>",
job_desc="Test Job",
region_name="ap-south-1",
verbose=True,
script_location="s3://..//../",
num_of_dpus=2)
```
### Anything else
This is code bug that will cause this issue every time. I will provide the
solution with enhancement for continuous logging.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]