damccorm commented on code in PR #34657:
URL: https://github.com/apache/beam/pull/34657#discussion_r2049623830


##########
sdks/python/apache_beam/io/gcp/bigquery_file_loads.py:
##########
@@ -1101,6 +1101,18 @@ def _load_data(
          of the load jobs would fail but not other. If any of them fails, then
          copy jobs are not triggered.
     """
+    self.reshuffle_before_load = not util.is_compat_version_prior_to(
+        p.options, "2.65.0")
+    if self.reshuffle_before_load:
+      # Ensure that TriggerLoadJob retry inputs are deterministic by breaking

Review Comment:
   Where does the non-determinism currently come from? If I'm reading things 
correctly, the preceding transform 
([PartitionFiles](https://github.com/apache/beam/blob/56b286cefabcaefe551785a048ff4413e79722a8/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L1285))
 is the only thing between this and the last fusion break (the GBK), and I 
think that should be deterministic since its operating per-element, but it is 
possible I'm missing something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to