scwhittle commented on code in PR #31206:
URL: https://github.com/apache/beam/pull/31206#discussion_r1599934624
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java:
##########
@@ -668,7 +668,7 @@ public static <T> Read<T> readAvros(Class<T> clazz) {
*/
public static <T> Read<T> readMessagesWithCoderAndParseFn(
Coder<T> coder, SimpleFunction<PubsubMessage, T> parseFn) {
- return Read.newBuilder(parseFn).setCoder(coder).build();
+ return
Read.newBuilder(parseFn).setCoder(coder).setNeedsAttributes(true).build();
Review Comment:
The attributes can add overhead in cases where they are not necessary (this
is why readMessages is different than readMessagesWithAttributes), in
particular with the DataflowRunner (see
[here](https://github.com/apache/beam/blob/bcac88b815e5fdd0a5009027bc6ee1f40c31eb2c/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1934
) so I don't think the default behavior should change.
The simplest would be to just add a new method that is like this but also
adds attributes, could name readMessagesWithAttributesWithCoderAndParseFn for
example.
It's getting pretty gross having all of these variants though when we have
an underlying builder object. Part of the trickiness with just having a
Read.withAttributes() method is that the PubsubMessageCoder depends on
needsattributes/needsMessageId. I think that we could likely make
Read.Builder.build() smarter to infer the correct coder or reject certain
combinations and then we could expose more builder-like methods
Read.withAttributes(), Read.withMessageId(), Read.withOrderingKey()
to use instead of the cross-product of variants.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]