[Proposal] Change to Default PubsubMessage Coder

Evan Galpin Mon, 12 Dec 2022 10:06:51 -0800

Hi folks,

I'd like to solicit feedback on the notion of using
PubsubMessageWithAttributesAndMessageIdAndOrderingKeyCoder[1] as the
default coder for Pubsub messages instead of the current default of
PubsubMessageWithAttributesCoder.


Not long ago, support for reading and writing Pubsub messages in Beam
including an OrderingKey was added[2].  Part of this change involved adding
a new Coder for PubsubMessage in order to capture and propagate the
orderingKey[1].  This change illuminated that in cases where the coder type
for PubsubMessage is inferred, it is possible to accidentally and silently
nullify fields like MessageId and OrderingKey in a way that is not at all
obvious to users[3].

So far two potential drawbacks of this proposal have been identified:
1. Update compatibility for pipelines using PubsubIO might require users to
explicitly specify the current default coder (
PubsubMessageWithAttributesCoder)
2. Messages would require a larger number of bytes to store as compared to
the current default (which could again be overcome by users specifying the
current default coder)

What other potential drawbacks might there be? I look forward to hearing
others' input!

Thanks,
Evan

[1]
https://github.com/apache/beam/pull/22216/files#diff-28243ab1f9eef144e45a9f6cb2e07fa1cf53c021ceaf733d92351254f38712fd
[2] https://github.com/apache/beam/pull/22216
[3] https://github.com/apache/beam/issues/23525

[Proposal] Change to Default PubsubMessage Coder

Reply via email to