claudevdm commented on code in PR #34657:
URL: https://github.com/apache/beam/pull/34657#discussion_r2052952994


##########
sdks/python/apache_beam/io/gcp/bigquery_file_loads.py:
##########
@@ -1101,6 +1101,18 @@ def _load_data(
          of the load jobs would fail but not other. If any of them fails, then
          copy jobs are not triggered.
     """
+    self.reshuffle_before_load = not util.is_compat_version_prior_to(

Review Comment:
   > This reshuffle should be added outside of this transform. Make it the 
responsibility of the caller to ensure stable inputs.
   
   Apologies, but I don't understand 100%. 
   
https://github.com/apache/beam/blob/d0def26d3ec3f120ef687a80a33d0645a22f30e9/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L866
 is the transform, 
https://github.com/apache/beam/blob/d0def26d3ec3f120ef687a80a33d0645a22f30e9/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L1080
 is a method that builds part of the BigQueryBatchFileLoads transform.
   
   
https://github.com/apache/beam/blob/d0def26d3ec3f120ef687a80a33d0645a22f30e9/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L609
 is the DoFn that requires stable inputs.
   
   So I am adding the reshuffle right before the DoFn that requires stable 
inputs (in _load_data). 
   
   Are you saying to create a new PTransform that wraps TriggerLoadJobs, with 
expand_2_264_0() that just returns ParDo(TriggerLoadJobs) and expand() returns 
Reshuffle() | ParDo(TriggerLoadJobs)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to