vandonr-amz commented on code in PR #29245:
URL: https://github.com/apache/airflow/pull/29245#discussion_r1104947794


##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -106,6 +108,41 @@ def _create_integer_fields(self) -> None:
         """
         self.integer_fields = []
 
+    def _get_unique_job_name(
+        self, proposed_name: str, fail_if_exists: bool, describe_func: 
Callable[[str], Any]
+    ) -> str:
+        """
+        Returns the proposed name if it doesn't already exist, otherwise 
returns it with a random suffix.
+
+        :param proposed_name: Base name.
+        :param fail_if_exists: Will throw an error if a job with that name 
already exists
+            instead of finding a new name.
+        :param describe_func: The `describe_` function for that kind of job.
+            We use it as an O(1) way to check if a job exists.
+        """
+        job_name = proposed_name
+        while self._check_if_job_exists(job_name, describe_func):
+            # this while should loop only once in most cases, just setting it 
this way to regenerate a name
+            # in case there is a random number collision.
+            if fail_if_exists:
+                raise AirflowException(f"A SageMaker job with name {job_name} 
already exists.")
+            else:
+                job_name = f"{proposed_name}-{random.randint(0, 999999999):09}"

Review Comment:
   It used to not be random, but that required to enumerate all jobs created 
since the beginning of times.
   Do you have a suggestion for a not random name ?
   I also wanted to keep the <name>-<digits> format.
   
   The best alternative I can see is to use the current date, but we still need 
to resolve intra-day (or intra-hour or whatever) conflicts.
   Listing all jobs for the day and adding a sequence number after that would 
be OK because it'd keep the number of jobs to list under control, but this 
behavior would also add a lot of complexity to this code, for limited benefits 
I think.
   
   Also, what benefits do you see in having a predictable job name ? The job 
name is returned by the operator, so following steps should use that returned 
value if they need the job name.
   If having a specific job name is important, the user can always setup the 
behavior to "fail" and come up with a programmatic naming scheme that will fit 
their usage, which will be easier than coming up with something that could fit 
all possible uses here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to