vandonr-amz commented on code in PR #29245:
URL: https://github.com/apache/airflow/pull/29245#discussion_r1104947794
##########
airflow/providers/amazon/aws/operators/sagemaker.py:
##########
@@ -106,6 +108,41 @@ def _create_integer_fields(self) -> None:
"""
self.integer_fields = []
+ def _get_unique_job_name(
+ self, proposed_name: str, fail_if_exists: bool, describe_func:
Callable[[str], Any]
+ ) -> str:
+ """
+ Returns the proposed name if it doesn't already exist, otherwise
returns it with a random suffix.
+
+ :param proposed_name: Base name.
+ :param fail_if_exists: Will throw an error if a job with that name
already exists
+ instead of finding a new name.
+ :param describe_func: The `describe_` function for that kind of job.
+ We use it as an O(1) way to check if a job exists.
+ """
+ job_name = proposed_name
+ while self._check_if_job_exists(job_name, describe_func):
+ # this while should loop only once in most cases, just setting it
this way to regenerate a name
+ # in case there is a random number collision.
+ if fail_if_exists:
+ raise AirflowException(f"A SageMaker job with name {job_name}
already exists.")
+ else:
+ job_name = f"{proposed_name}-{random.randint(0, 999999999):09}"
Review Comment:
It used to not be random, but that required to enumerate all jobs created
since the beginning of times.
Do you have a suggestion for a not random name ?
I also wanted to keep the <name>-<digits> format.
The best alternative I can see is to use the current date, but we still need
to resolve intra-day (or intra-hour or whatever) conflicts.
Listing all jobs for the day and adding a sequence number after that would
be OK because it'd keep the number of jobs to list under control, but this
behavior would also add a lot of complexity to this code, for limited benefits
I think.
Also, what benefits do you see in having a predictable job name ? The job
name is returned by the operator, so following steps should use that returned
value if they need the job name.
If having a specific job name is important, the user can always setup the
behavior to "fail" and come up with a programmatic naming scheme that will fit
their usage, which will be easier than coming up with something that could fit
all possible uses here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]