It would still be a standard coder - the distinction I'm proposing is that there are certain coders that _must_ be implemented by a new runner/sdk (for example windowedvalue, varint, kv, ...) since they are important for SDK - runner communication, but now we're starting to standardize coders that are useful for cross-language and schemas.
On Fri, Sep 27, 2019 at 5:35 PM Chad Dombrova <chad...@gmail.com> wrote: > Would BooleanCoder continue to fall into this category? I was under the > impression we might make it a full fledge standard coder with this PR. > > > > On Fri, Sep 27, 2019 at 5:32 PM Brian Hulette <bhule...@google.com> wrote: > >> +1, thank you! >> >> Note In my Row Coder PR I added a new section for "Additional Standard >> Coders" - i.e. coders that have a URN, but aren't required for a new >> runner/sdk to implement the beam model: >> https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R646 >> >> I think this would belong there as well, assuming that is a >> distinction we want to make. >> >> On Fri, Sep 27, 2019 at 5:22 PM Thomas Weise <t...@apache.org> wrote: >> >>> +1 for adding the coder >>> >>> Please also add a test here: >>> https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml >>> >>> >>> On Fri, Sep 27, 2019 at 5:17 PM Chad Dombrova <chad...@gmail.com> wrote: >>> >>>> Are there any dissenting votes to making a BooleanCoder a standard >>>> (portable) coder? >>>> >>>> I'm happy to make a PR to implement a BooleanCoder in python (and to >>>> add the Java BooleanCoder to the ModelCoderRegistrar) if everyone agrees >>>> that this is useful. >>>> >>>> -chad >>>> >>>> >>>> On Fri, Sep 27, 2019 at 3:32 PM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>>>> I think boolean is useful to have. What I'm more skeptical of is >>>>> adding standard types for variations like UnsignedInteger16, etc. that >>>>> don't have natural representations in all languages. >>>>> >>>>> On Fri, Sep 27, 2019 at 2:46 PM Brian Hulette <bhule...@google.com> >>>>> wrote: >>>>> > >>>>> > Some more context from an offline discussion I had with +Robert >>>>> Bradshaw a while ago: We both agreed all of the coders listed in BEAM-7996 >>>>> should be implemented in Python, but didn't come to a conclusion on >>>>> whether >>>>> or not they should actually be _standard_ coders, versus just being >>>>> implicitly standard as part of row coder. >>>>> > >>>>> > On Fri, Sep 27, 2019 at 2:29 PM Kenneth Knowles <k...@apache.org> >>>>> wrote: >>>>> >> >>>>> >> Yes, noted here: >>>>> https://github.com/apache/beam/pull/9188/files#diff-f0d64c2cfc4583bfe2a7e5ee59818ae2R678 >>>>> and that links to https://issues.apache.org/jira/browse/BEAM-7996 >>>>> >> >>>>> >> Kenn >>>>> >> >>>>> >> On Fri, Sep 27, 2019 at 12:57 PM Reuven Lax <re...@google.com> >>>>> wrote: >>>>> >>> >>>>> >>> Java has one, implemented as a byte coder. My guess is that nobody >>>>> has gotten around to implementing it yet for portability. >>>>> >>> >>>>> >>> On Fri, Sep 27, 2019 at 12:44 PM Chad Dombrova <chad...@gmail.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> Hi all, >>>>> >>>> It seems a bit unfortunate that there isn’t a portable way to >>>>> serialize a boolean value. >>>>> >>>> >>>>> >>>> I’m working on porting my external PubsubIO PR over to use the >>>>> improved schema-based external transform API in python, but because of >>>>> this >>>>> limitation I can’t use boolean values. For example, this fails: >>>>> >>>> >>>>> >>>> ReadFromPubsubSchema = typing.NamedTuple( >>>>> >>>> 'ReadFromPubsubSchema', >>>>> >>>> [ >>>>> >>>> ('topic', typing.Optional[unicode]), >>>>> >>>> ('subscription', typing.Optional[unicode]), >>>>> >>>> ('id_label', typing.Optional[unicode]), >>>>> >>>> ('with_attributes', bool), >>>>> >>>> ('timestamp_attribute', typing.Optional[unicode]), >>>>> >>>> ] >>>>> >>>> ) >>>>> >>>> >>>>> >>>> It fails because coders.get_coder(bool) returns the non-portable >>>>> pickle coder. >>>>> >>>> >>>>> >>>> In the short term I can hack something into the external >>>>> transform API to use varint coder for bools, but this kind of hacky >>>>> approach to portability won’t work in scenarios where round-tripping is >>>>> required without user intervention. In other words, in python it is not >>>>> uncommon to test if x is True, in which case the integer 1 would fail this >>>>> test. All of that is to say that a BooleanCoder would be a convenient way >>>>> to ensure the proper type is used everywhere. >>>>> >>>> >>>>> >>>> So, I was just wondering why it’s not there? Are there concerns >>>>> over whether booleans are universal enough to make part of the portability >>>>> standard? >>>>> >>>> >>>>> >>>> -chad >>>>> >>>>