[ 
https://issues.apache.org/jira/browse/AIRFLOW-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102317#comment-17102317
 ] 

ASF GitHub Bot commented on AIRFLOW-2310:
-----------------------------------------

abdulbasitds commented on pull request #6007:
URL: https://github.com/apache/airflow/pull/6007#issuecomment-625666407


   > I ran some end-to-end tests using this code in one of my dag files. It's 
working ok if you make those changes following my comments.
   > 
   > Right now I'm trying to retrieve the logs generated by Glue jobs so that 
we could print out the logs in Airflow. Without the logs in Airflow, debugging 
failed jobs is a hassle.
   > 
   > Update:
   > Obtaining logs is easy, just add this in the operator class:
   > 
   > ```python
   >     GLUE_LOGS_GROUP = "/aws-glue/jobs/output"
   >     GLUE_ERRS_GROUP = "/aws-glue/jobs/error"
   > ```
   > 
   > as class attributes
   > 
   > ```python
   >     def get_glue_logs(self, log_group_name, log_stream_name):
   >         """Glue logs are too chatty, only get the ones that have errors"""
   >         self.log.info('Glue job logs output from group %s:', 
log_group_name)
   >         for event in self.get_logs_hook().get_log_events(
   >             log_group_name,
   >             log_stream_name,
   >         ):
   >             event_dt = datetime.fromtimestamp(event['timestamp'] / 1000.0)
   >             event_msg = event['message']
   >             # Glue logs are extremely chatty, we only get log entries that 
have "error"
   >             if "error" in event_msg:
   >                 self.log.info("[%s] %s", event_dt.isoformat(), event_msg)
   > 
   >     def get_logs_hook(self):
   >         """Create and return an AwsLogsHook."""
   >         return AwsLogsHook(
   >             aws_conn_id=self.aws_conn_id,
   >             region_name=self.awslogs_region
   >         )
   > ```
   > 
   > as class methods, and call it in `execute()`
   > 
   > ```python
   > glue_job_run_id = glue_job_run['JobRunId']
   > 
   > self.get_glue_logs(self.GLUE_LOGS_GROUP, glue_job_run_id)
   > ```
   
   
   
   > I ran some end-to-end tests using this code in one of my dag files. It's 
working ok if you make those changes following my comments.
   > 
   > Right now I'm trying to retrieve the logs generated by Glue jobs so that 
we could print out the logs in Airflow. Without the logs in Airflow, debugging 
failed jobs is a hassle.
   > 
   > Update:
   > Obtaining logs is easy, just add this in the operator class:
   > 
   > ```python
   >     GLUE_LOGS_GROUP = "/aws-glue/jobs/output"
   >     GLUE_ERRS_GROUP = "/aws-glue/jobs/error"
   > ```
   > 
   > as class attributes
   > 
   > ```python
   >     def get_glue_logs(self, log_group_name, log_stream_name):
   >         """Glue logs are too chatty, only get the ones that have errors"""
   >         self.log.info('Glue job logs output from group %s:', 
log_group_name)
   >         for event in self.get_logs_hook().get_log_events(
   >             log_group_name,
   >             log_stream_name,
   >         ):
   >             event_dt = datetime.fromtimestamp(event['timestamp'] / 1000.0)
   >             event_msg = event['message']
   >             # Glue logs are extremely chatty, we only get log entries that 
have "error"
   >             if "error" in event_msg:
   >                 self.log.info("[%s] %s", event_dt.isoformat(), event_msg)
   > 
   >     def get_logs_hook(self):
   >         """Create and return an AwsLogsHook."""
   >         return AwsLogsHook(
   >             aws_conn_id=self.aws_conn_id,
   >             region_name=self.awslogs_region
   >         )
   > ```
   > 
   > as class methods, and call it in `execute()`
   > 
   > ```python
   > glue_job_run_id = glue_job_run['JobRunId']
   > 
   > self.get_glue_logs(self.GLUE_LOGS_GROUP, glue_job_run_id)
   > ```
   
   @zachliu may be we can extend logging latter, for now it would be difficult 
for me to provide description to the argument and make sure if the code is 
correct as I havent used loggin much


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Enable AWS Glue Job Integration
> -------------------------------
>
>                 Key: AIRFLOW-2310
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2310
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib
>            Reporter: Olalekan Elesin
>            Assignee: Olalekan Elesin
>            Priority: Major
>              Labels: AWS
>
> Would it be possible to integrate AWS Glue into Airflow, such that Glue jobs 
> and ETL pipelines can be orchestrated with Airflow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to