MiguelAnzoWizeline commented on a change in pull request #14811:
URL: https://github.com/apache/beam/pull/14811#discussion_r657468517



##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/BatchSpannerRead.java
##########
@@ -73,18 +74,16 @@ public static BatchSpannerRead create(
         .apply(
             "Generate Partitions",
             ParDo.of(new GeneratePartitionsFn(getSpannerConfig(), 
txView)).withSideInputs(txView))
-        .apply("Shuffle partitions", Reshuffle.<Partition>viaRandomKey())
         .apply(
             "Read from Partitions",
             ParDo.of(new ReadFromPartitionFn(getSpannerConfig(), 
txView)).withSideInputs(txView));
   }
 
   @VisibleForTesting
-  static class GeneratePartitionsFn extends DoFn<ReadOperation, Partition> {
+  static class GeneratePartitionsFn extends DoFn<ReadOperation, 
List<Partition>> {

Review comment:
       Hi @boyuanzz 
   Retaking the idea in this issue, I looked for an example of using side 
inputs in the SplitRestriction and GetInitialRestriction methods but I haven’t 
find any. In case that it's not possible to use the sideinput there to estimate 
the amount of data I think the best way to get a good estimation of the amount 
of data to split would be to keep the logic in two doFn, as we get the 
Partitions and the size of the list of partitions that we can work with.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to