[ 
https://issues.apache.org/jira/browse/BEAM-13760?focusedWorklogId=717544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-717544
 ]

ASF GitHub Bot logged work on BEAM-13760:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Jan/22 13:52
            Start Date: 29/Jan/22 13:52
    Worklog Time Spent: 10m 
      Work Description: thorbjorn444 commented on a change in pull request 
#16641:
URL: https://github.com/apache/beam/pull/16641#discussion_r795054678



##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -421,10 +422,16 @@ def _build_default_job_name(user_name):
     user_name = re.sub('[^-a-z0-9]', '', user_name.lower())
     date_component = datetime.utcnow().strftime('%m%d%H%M%S-%f')
     app_user_name = 'beamapp-{}'.format(user_name)
-    job_name = '{}-{}'.format(app_user_name, date_component)
+    # append 8 random alphanumeric characters to avoid collisions.
+    random_component = ''.join(
+        random.choices(str(uuid.uuid4()).replace('-', ''), k=8))

Review comment:
       That makes total sense to me. The previous implementation is a bit 
path-dependent ("We can resolve this by appending a UUID" -> "That's too long, 
I'll just select a few random characters from the sequence").
   
   Please see the updated implementation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 717544)
    Remaining Estimate: 46h 40m  (was: 46h 50m)
            Time Spent: 1h 20m  (was: 1h 10m)

> Add randomness to default Dataflow job name in Python sdk
> ---------------------------------------------------------
>
>                 Key: BEAM-13760
>                 URL: https://issues.apache.org/jira/browse/BEAM-13760
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Will Nicholson
>            Assignee: Will Nicholson
>            Priority: P2
>   Original Estimate: 48h
>          Time Spent: 1h 20m
>  Remaining Estimate: 46h 40m
>
> Currently, when a Dataflow job is created with the default name in python, 
> the name is a concatenation of the word "beamapp", the username, and the time 
> in microseconds, as seen 
> [here|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L415-L428].
> Therefore, when two jobs are created by the same user at the same time, the 
> jobs names collide and the second job fails. 
> However, the Java SDK has already solved this problem, by appending a random 
> hex string to the job name, seen 
> [here|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java#L338-L351].
> The objective of this issue is to align the python sdk with the java sdk, by 
> appending a random string to the default job name. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to