udim commented on a change in pull request #11241: [BEAM-5422] Document
DynamicDestinations.getTable uniqueness requirement
URL: https://github.com/apache/beam/pull/11241#discussion_r405231575
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.java
##########
@@ -142,7 +142,11 @@ void setSideInputAccessorFromProcessContext(DoFn<?,
?>.ProcessContext context) {
return null;
}
- /** Returns a {@link TableDestination} object for the destination. May not
return null. */
+ /**
+ * Returns a {@link TableDestination} object for the destination. May not
return null. Return
+ * value needs to be unique to each destination: may not return the same
{@link TableDestination}
+ * for different destinations.
Review comment:
TLDR: Pablo is right.
In Python SDK, a user function translates an element to a TableReference.
In Java SDK, a user DynamicDestinations instance translates an element to a
DestinationT, and then to a TableDestination.
Java does a reshuffle on (DestinationT, element) pairs, while Python does it
on (TableReference, element) pairs.
(Not sure why Java uses an intermediate DestinationT. Convenience? Better
GBK performance? Lower resource use?)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services