ahmedabu98 commented on code in PR #28272:
URL: https://github.com/apache/beam/pull/28272#discussion_r1316325274


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryResourceNaming.java:
##########
@@ -47,21 +48,22 @@ class BigQueryResourceNaming {
    * @param prefix A prefix generated in {@link 
BigQueryResourceNaming::createJobIdPrefix}.
    * @param tableDestination A descriptor of the destination table.
    * @param partition A partition number in the destination table.
-   * @param index
-   * @return
+   * @return a generated jobId.
    */
   static String createJobIdWithDestination(
-      String prefix, TableDestination tableDestination, int partition, long 
index) {
+      String prefix, TableDestination tableDestination, int partition) {
     // Job ID must be different for each partition of each table.
     String destinationHash =
-        
Hashing.murmur3_128().hashUnencodedChars(tableDestination.toString()).toString();
-    String jobId = String.format("%s_%s", prefix, destinationHash);
+        Hashing.murmur3_128()
+            .hashUnencodedChars(tableDestination.toString())
+            .toString()
+            .substring(0, 16);
+    // add randomness to jobId to avoid conflict
+    String jobId =
+        String.format("%s_%s_%s", prefix, destinationHash, 
randomUUIDString().substring(0, 16));

Review Comment:
   I'm worried this may cause duplication of data due to bundle retry. 
   
   Let's say a bundle fails during load/copy execution and the BQ job was 
successful. Beam would process the bundle again and these lines will create a 
fresh job ID. Under previous circumstances, this job ID is constructed in a 
deterministic way and would be recognized by BQ as a recently successful job so 
will be ignored. But now since the ID is always new, it will execute the job 
and we will end up with duplicate data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to