claudevdm commented on code in PR #34657:
URL: https://github.com/apache/beam/pull/34657#discussion_r2049696072


##########
sdks/python/apache_beam/io/gcp/bigquery_file_loads.py:
##########
@@ -1101,6 +1101,18 @@ def _load_data(
          of the load jobs would fail but not other. If any of them fails, then
          copy jobs are not triggered.
     """
+    self.reshuffle_before_load = not util.is_compat_version_prior_to(
+        p.options, "2.65.0")
+    if self.reshuffle_before_load:
+      # Ensure that TriggerLoadJob retry inputs are deterministic by breaking

Review Comment:
   Thinking about it more, does Reshuffle force determinism by grouping by 
unique id's?
   
   Without reshuffle, if more elements destined for a given destination (key 
for GroupFilesByTableDestinations) arrived between retries, is there a chance 
these new files could be materialized for the key, and therefore more files are 
read by the GroupFilesByTableDestinations.read?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to