claudevdm commented on code in PR #34657:
URL: https://github.com/apache/beam/pull/34657#discussion_r2049696072
##########
sdks/python/apache_beam/io/gcp/bigquery_file_loads.py:
##########
@@ -1101,6 +1101,18 @@ def _load_data(
of the load jobs would fail but not other. If any of them fails, then
copy jobs are not triggered.
"""
+ self.reshuffle_before_load = not util.is_compat_version_prior_to(
+ p.options, "2.65.0")
+ if self.reshuffle_before_load:
+ # Ensure that TriggerLoadJob retry inputs are deterministic by breaking
Review Comment:
Thinking about it more, does Reshuffle force determinism by grouping by
unique id's?
Without reshuffle, if more elements destined for a given destination (key
for GroupFilesByTableDestinations) arrived between retries, is there a chance
these new files could be materialized for the key, and therefore more files are
read by the GroupFilesByTableDestinations.read?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]