Some more context from an offline discussion I had with +Robert Bradshaw <rober...@google.com> a while ago: We both agreed all of the coders listed in BEAM-7996 should be implemented in Python, but didn't come to a conclusion on whether or not they should actually be _standard_ coders, versus just being implicitly standard as part of row coder.
On Fri, Sep 27, 2019 at 2:29 PM Kenneth Knowles <k...@apache.org> wrote: > Yes, noted here: > https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R678 > and > that links to https://issues.apache.org/jira/browse/BEAM-7996 > > Kenn > > On Fri, Sep 27, 2019 at 12:57 PM Reuven Lax <re...@google.com> wrote: > >> Java has one, implemented as a byte coder. My guess is that nobody has >> gotten around to implementing it yet for portability. >> >> On Fri, Sep 27, 2019 at 12:44 PM Chad Dombrova <chad...@gmail.com> wrote: >> >>> Hi all, >>> It seems a bit unfortunate that there isn’t a portable way to serialize >>> a boolean value. >>> >>> I’m working on porting my external PubsubIO PR over to use the improved >>> schema-based external transform API in python, but because of this >>> limitation I can’t use boolean values. For example, this fails: >>> >>> ReadFromPubsubSchema = typing.NamedTuple( >>> 'ReadFromPubsubSchema', >>> [ >>> ('topic', typing.Optional[unicode]), >>> ('subscription', typing.Optional[unicode]), >>> ('id_label', typing.Optional[unicode]), >>> ('with_attributes', bool), >>> ('timestamp_attribute', typing.Optional[unicode]), >>> ] >>> ) >>> >>> It fails because coders.get_coder(bool) returns the non-portable pickle >>> coder. >>> >>> In the short term I can hack something into the external transform API >>> to use varint coder for bools, but this kind of hacky approach to >>> portability won’t work in scenarios where round-tripping is required >>> without user intervention. In other words, in python it is not uncommon to >>> test if x is True, in which case the integer 1 would fail this test. >>> All of that is to say that a BooleanCoder would be a convenient way to >>> ensure the proper type is used everywhere. >>> >>> So, I was just wondering why it’s not there? Are there concerns over >>> whether booleans are universal enough to make part of the portability >>> standard? >>> >>> -chad >>> >>