+1 to looking into using RowCoder, this may help avoid creating more specialized coders in the future (which is mentioned as a pain point in the issue you linked [1]).
[1] https://github.com/apache/beam/issues/23525#issuecomment-1281294275 On Tue, Dec 20, 2022 at 3:00 AM Andrew Pilloud via dev <dev@beam.apache.org> wrote: > I think the Dataflow update concern is the biggest concern and as you > point out that can be easily overcome. I generally believe that users who > aren't setting the coder don't actually care as long as it works, so > changing the default across Beam versions seems relatively low > risk. Another mitigating factor is that both concerns require users to > actually be using the coder (likely via Reshuffle.viaRandomKey) rather than > consuming the output in a fused ParDo (which I think is what we would > recommend). > > As a counter proposal: is there something that stops us from using > RowCoder by default here? I assume all forms of "PubsubMessage" can be > encoded with RowCoder, it provides flexibility for future changes, and > PubSub will be able to take advantage of future work to make RowCoder more > efficient. (If we can't switch to RowCoder, that seems like a useful > feature request for RowCoder.) > > Andrew > > On Mon, Dec 19, 2022 at 3:37 PM Evan Galpin <egal...@apache.org> wrote: > >> Bump 🙃 >> >> Any other risks or drawbacks associated with altering the default coder >> for PubsubMessage to be the most inclusive coder with respect to possible >> fields? >> >> Thanks, >> Evan >> >> >> On Mon, Dec 12, 2022 at 10:06 AM Evan Galpin <egal...@apache.org> wrote: >> >>> Hi folks, >>> >>> I'd like to solicit feedback on the notion of using >>> PubsubMessageWithAttributesAndMessageIdAndOrderingKeyCoder[1] as the >>> default coder for Pubsub messages instead of the current default of >>> PubsubMessageWithAttributesCoder. >>> >>> Not long ago, support for reading and writing Pubsub messages in Beam >>> including an OrderingKey was added[2]. Part of this change involved adding >>> a new Coder for PubsubMessage in order to capture and propagate the >>> orderingKey[1]. This change illuminated that in cases where the coder type >>> for PubsubMessage is inferred, it is possible to accidentally and silently >>> nullify fields like MessageId and OrderingKey in a way that is not at all >>> obvious to users[3]. >>> >>> So far two potential drawbacks of this proposal have been identified: >>> 1. Update compatibility for pipelines using PubsubIO might require users >>> to explicitly specify the current default coder ( >>> PubsubMessageWithAttributesCoder) >>> 2. Messages would require a larger number of bytes to store as compared >>> to the current default (which could again be overcome by users specifying >>> the current default coder) >>> >>> What other potential drawbacks might there be? I look forward to hearing >>> others' input! >>> >>> Thanks, >>> Evan >>> >>> [1] >>> https://github.com/apache/beam/pull/22216/files#diff-28243ab1f9eef144e45a9f6cb2e07fa1cf53c021ceaf733d92351254f38712fd >>> [2] https://github.com/apache/beam/pull/22216 >>> [3] https://github.com/apache/beam/issues/23525 >>> >>