[GitHub] [airflow] vandonr-amz commented on a diff in pull request #29522: Add support in AWS Batch Operator for multinode jobs

via GitHub Tue, 14 Mar 2023 12:39:22 -0700


vandonr-amz commented on code in PR #29522:
URL: https://github.com/apache/airflow/pull/29522#discussion_r1136097037



##########
airflow/providers/amazon/aws/hooks/batch_client.py:
##########
@@ -419,8 +419,46 @@ def get_job_awslogs_info(self, job_id: str) -> dict[str, 
str] | None:
 
         :param job_id: AWS Batch Job ID
         """
-        job_container_desc = 
self.get_job_description(job_id=job_id).get("container", {})
-        log_configuration = job_container_desc.get("logConfiguration", {})
+        job_desc = self.get_job_description(job_id=job_id)

Review Comment:
   ok I see your point, but should there really be one log link per job ?
   I'm looking at it, and it seems that in the case of a multinode job, there 
is multiple `log_configuration` (one per node), but from that log config  we get
   
   - the log group
   - the region
   
   I'd imagine that multinode batch jobs would not be multi-region ? So that'd 
would be a constant across all nodes.
   And also, I suppose in an overwhelming majority of the cases, the log group 
would be the same for all nodes (it would be very weird if it wasn't).
   
   Then we get the stream name from the attempts, but this does not depend on 
the number of nodes. I imagine in most cases there would be one attempt. If 
there are more, we make the choice of returning the stream name for the last 
attempt, which makes sense.
   
   The job runs on many nodes, but the logs all end up in the same log stream.
   
   What we can do is iterate on the log configs to make sure they are all 
sending logs
   
   - to aws
   - in the same region
   - in the same group
   
   and log a warning if it's not the case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] vandonr-amz commented on a diff in pull request #29522: Add support in AWS Batch Operator for multinode jobs

Reply via email to